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REGENERATIVE TREE GROWTH: BINARY SELF-SIMILAR 
CONTINUUM RANDOM TREES AND POISSON DIRICHLET 

COMPOSITIONS 1 

By Jim Pitman and Matthias Winkel 

University of California, Berkeley and University of Oxford 

We use a natural ordered extension of the Chinese Restaurant 
Process to grow a two-parameter family of binary self-similar contin- 
uum fragmentation trees. We provide an explicit embedding of Ford's 
sequence of alpha model trees in the continuum tree which we iden- 
tified in a previous article as a distributional scaling limit of Ford's 
trees. In general, the Markov branching trees induced by the two- 
parameter growth rule are not sampling consistent, so the existence 
of compact limiting trees cannot be deduced from previous work on 
the sampling consistent case. We develop here a new approach to es- 
tablish such limits, based on regenerative interval partitions and the 
urn-model description of sampling from Dirichlet random distribu- 
tions. 

1. Introduction. We are interested in growth schemes for random rooted 
binary trees T n with n leaves labeled by [n] = {l,...,n} of the following 
general form. 

Definition 1 . Let T\ be the tree with a single edge joining a root vertex 
and a leaf vertex labeled 1. Let T2 be the Y-shaped tree consisting of a root 
and leaves labeled 1 and 2, each connected by an edge to a unique branch 
point. 

To create T n+ \ from T n , select an edge of T n , say, a n — > c n , directed away 
from the root, replace it by three edges a n — ► b n , b n — > c n and b n — ► n + 1 
so that two new edges connect the two vertices a n and c n to a new branch 
point b n and a further edge connects b n to a new leaf labeled n + 1. 
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A binary tree growth process is a sequence (T n ,n > 1) of random trees 
constructed in this way where at each step the edge a n — > c n is selected ran- 
domly according to some selection rule, meaning a conditional distribution 
given T n for an edge of T n . Given a selection rule, each tree T n has a dis- 
tribution on the space Tr n i of rooted binary trees with n leaves labeled [n] , 
and the selection rule specifies for all n > 1 conditional distributions of T n+ i 
given T n . 

The uniform rule, where each of the 2n — 1 edges of T n is selected with 
equal probability, gives a known binary tree growth process [25] related to 
the Brownian continuum random tree [1, 24]. Ford [10] introduced a one- 
parameter family of binary tree growth processes, where the selection rule 
for < a < 1 is as follows: 

(i) Given T n for n > 2, assign a weight 1 — a to each of the n edges adjacent 
to a leaf, and a weight a to each of the n — 1 other edges. 

(ii) Select an edge of T n at random with probabilities proportional to the 
weights assigned by step (i). 

For us, this selection rule will be the (a, 1 — a)-rule. Note that a = 1/2 gives 
the uniform rule. 

In [18] we showed that, also for a ^ 1/2, the trees T n with leaf labels 
removed, denoted T°, have a continuum fragmentation tree T a as their dis- 
tributional scaling limit, when considered as M-trees with unit edge lengths: 
n~ a T° — > T a in distribution for the Gromov-Hausdorff topology. However, 
in the main part of [18] and in all other fragmentation literature we are 
aware of, the labeling of leaves is exchangeable, while the labeling of leaves 
in order of appearance in the trees T n grown using the (a, 1 — a)-rule is 
not. Our results in [18] applied because of a weak sampling consistency 
of the (a, 1 — a)-trees; cf. [10]. The subtlety with these trees is that they 
are strongly sampling consistent in the sense defined in Definition 2 only if 
a = 1/2; cf. [18]. 

Definition 2. A binary tree growth process (T n ,n > 1) is called weakly 
sampling consistent if the distributions of the delabeled trees T° and T° 
coincide for all n > 1, where T° is obtained from T° +1 by removal of a 
leaf chosen uniformly at random. The process is called strongly sampling 
consistent if the distributions of (T°,T° +1 ) and (T°,T° +1 ) coincide for all 
n> 1. 

In this paper we take up the study of nonexchangeable labeling and the 
role of weak sampling consistency for a two-parameter extension of the 
(a, 1 — a)-rule; cf. Figure 1. 
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Fig. 1. Recursive tree growth: in this scenario, the recursion consists of two steps. 
Weights for root edge and subtrees are displayed for the first step. The subtree T n> i is 
selected. Within tree T n ,\, the root edge is selected. Leaf 6 is inserted at the selected edge. 

Definition 3. Let < a < 1 and 9 > 0. We define the (a, #)-selection 
rule as follows: 

(i) rcc For n > 2, the tree T n branches at the branch point adjacent to the 

root into two subtrees T U;0 and T n l . Given these are of sizes m and 
n — m, say, where T n> i contains the smallest label in T n , assign weight 
q to the edge connecting the root and the adjacent branch point, 
weights m — a and n — m — 1 + 6, respectively, to the subtrees. 

(ii) r ec Select the root edge or a subtree with probabilities proportional to 

these weights. If a subtree with two or more leaves was selected, re- 
cursively apply the weighting procedure (i) rec to the selected subtree, 
until the root edge or a subtree with a single leaf was selected. If a 
subtree with a single leaf was selected, select the unique edge of this 
subtree. 

A binary tree growth process (T" ,e ,n > 1) grown via the (a, 0)-rules (i) rec , 
(ii)rec, for some < a < 1 and 9 > 0, is called an (a, 6) -tree growth process. 

For 9 = 1 — q, each edge is chosen with the same probabilities as with 
Ford's rules (i) and (ii). 

The boundary cases a = and a = 1 are special and easy to describe 
(see Section 3.2). Growth is then linear or logarithmic in height, and scaling 
limits have a degenerate branching structure. We therefore focus on the 
parameter range < a < 1 and study scaling limits and asymptotics of the 
associated trees T n = T^ ,e . 

We pointed out in [18] that Ford's (a, 1 — a)-tree growth process is as- 
sociated with a Chinese Restaurant Process ( CRP) as follows. The height 
K n of leaf 1 in T n increases whenever an edge on the path connecting 1 
with the root, which we call the spine, is selected. Whenever a spinal edge 
is selected, the edge is replaced by two new spinal edges and a new subtree 
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Fig. 2. The (a, 8) tree growth procedure induces an ordered Chinese Restaurant Process. 

starts growing off the spine. If we call the subtrees off the spine tables and 
the leaves in subtrees customers, then the process of table sizes follows the 
(a, 1 — a) seating plan of a CRP in the terminology of [24]. Similarly, we 
identify an (a, 9) seating plan in the two-parameter model, meaning that 
the (n + l)st customer is seated at the jth table, with rtj customers already 
seated, with probability proportional to nj — a and at a new table with 
probability proportional to 6 + ka, if k tables are occupied. See Figure 2. 
Note that 

the kth customer in the restaurant is labeled (k + 1) as leaf in the tree, 

(1) 

since leaf 1 is not in a subtree off the spine. 

The theory of CRPs [24] immediately gives us a.s. a limit height L a ^ = 
Xmin^oo K n /n a of leaf 1, as well as limiting proportions (Pi, P 2 , ■ ■ .) of leaves 
in each subtree in birth order, that is, in the order of least numbered leaves 
of subtrees, which can be represented as 

(P 1 ,P 2 , ...) = {W 1 ,W 1 W 2 ,W 1 W 2 Wz, . . .), 

where the W% are independent, Wi has a beta(l — a,0 + ia) distribution 
on the unit interval, and Wi := 1 — Wj. The distribution of the sequence 
of ranked limiting proportions is then Poisson-Dirichlet with parameters 
(a, 0), for short PD(a,0). 

However, this spinal decomposition of the tree also specifies the spinal 
order, that is, the order in which subtrees are encountered on the spine from 
the root to leaf 1 (from left to right in Figure 2). Note that due to the leaf 
labeling and the sequential growth of T n , n> 1 , subtrees are identifiable and 
keep their order throughout, which makes the spinal order consistent as n 
varies. After the insertion of leaf n + 1, the sizes of subtrees in birth order 
and in spinal order form two compositions of n, n> 1. While the birth order 
is well known to be size-biased, we show that the compositions in spinal 
order form a regenerative composition structure in the sense of Gnedin and 
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Fig. 3. A tree equipped with strings of beads; crushing a bead into a new string of beads. 

Pitman [13], which is weakly sampling consistent for all < a < 1 and 9 > 0, 
but not strongly so unless 9 = a [Proposition 6(i) and (ii)]. 

It follows from [13] in the strongly sampling consistent case 9 = a that 
the rescaled compositions converge almost surely to the associated regener- 
ative interval partition and that the block containing leaf 2 is a size-biased 
pick from the composition of n, or from the interval partition in the limit 
n — > oo. We obtain almost sure limiting results for the nonstrongly sampling 
consistent compositions (and discrete local times) in spinal order [Propo- 
sition 6(iii) and (iv)], and we solve the problem of finding leaf 2 in the 
nonstrongly consistent case, for the spinal composition of n (Lemma 9) and 
for the limiting interval partition (Proposition 10). The limiting interval 
partition arranges the limiting proportions (P±,P2, . . .) in spinal order. We 
consider inverse local time L~ l as a random distribution function on the 
interval [0, L a) e]. Then ([0, L at $], dL" 1 ) is an (a, 0)-string of beads in the 
following sense. 

Definition 4. An interval (I,fJ>) equipped with a discrete measure ji is 
called a string of beads. We refer to the weighted random interval ([0,-Lq.^], 
dL^ 1 ) associated with an (a, 6>)-regenerative partition as (a, 0)-string of 
beads. We will also use this term for isometric copies of weighted intervals 
as in Figure 3. 

As a by-product of these developments (Corollary 8), we obtain a sequen- 
tial construction of the interval partition associated with the (a, 9) regen- 
erative composition structure described in [13], Section 8. This provides a 
much more combinatorial approach to the (a, 9) regenerative interval par- 
tition than was given in [13], and solves the problem, left open in [13], 
of explicitly describing for general (a, 9) how interval lengths governed by 
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PD(a,#) should be ordered to form an (a, 9) regenerative interval partition 
of [0,1] (Corollary 7). 

We formulate and prove these results in Section 2. While they are key 
results for the study of the trees T" ,e , they are also of independent interest 
in a framework of an ordered CRP. This notion will be made precise there 
and studied in some detail. 

In Section 3 we formally introduce leaf-labeled rooted binary trees and 
the Markov branching property. We show that the delabeled trees from the 
(a,#)-tree growth rules have the Markov branching property, and that the 
labeled trees have a regenerative property, which reflects the recursive nature 
of the growth rules (Proposition 11). We then study sampling consistency 
as defined in Definition 2: 

Proposition 1. Let (T^' e ,n> 1) be an (a, 6) -tree growth process for 
some < a < 1 and 9 > 0, and T" ,e, ° , n > 1, the associated delabeled trees. 

(a) T"' has exchangeable leaf labels for all n > 1 if and only if a = 9 = 1/2. 

(b) (T" ,e,0 ,n > 1) is strongly sampling consistent if and only if a = 9 = 1/2. 

(c) (T" ,6, '°,n > 1) is weakly sampling consistent if and only if 9 = 1 — a or 
9 = 2 -a. 

We actually show that the distributions of delabeled trees coincide for 
9 = 1 — a and 9 = 2 — a, and do so only in these weakly sampling consistent 
cases (Lemma 12). 

The main contribution of this paper is to establish limiting continuum 
random trees (CRTs) even without weak sampling consistency. For a tree T n 
labeled by [n] = {1, . . . , n}, we denote by S(T n ; [k]) the smallest subtree of T n 
that contains the root and the leaves labeled 1, . . . , k. It will be convenient 
to use Aldous's formalism of reduced trees with edge lengths: denote by 
R(T n ; [k]) the tree with edges marked as follows; because of the growth 
procedure each vertex of is also a vertex of T n , and we mark each edge of 
Tfc by the graph distance in T n of the two vertices that the edge connects. 
First, we study the asymptotics of these reduced trees. 

Proposition 2. Let (T£' e ,n> 1) be an (a, 9) -tree growth process. If 
< a < 1 and 9 > 0, then 

n~ a R(T®' e , [k]) — > TZ^' e almost surely as n — > oo, 

in the sense that the 2k — 1 edge lengths of R(T^' e , [k]) scaled by n a converge 
almost surely as n — > oo to limiting edge lengths of a tree 1Z?' , for all k>l. 

We proved this in [18], Proposition 18, for Ford's (a, 1 — a)-tree growth 
process. As in [18], we will also provide an explicit description of the distri- 
bution of (1Z^ ,e ,k > 1). We will, in fact, prove a stronger statement for trees 
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TZ^' where each edge has the structure of a string of beads that records 
limiting proportions of leaves of subtrees as atoms on the branches (Propo- 
sition 14 and Corollary 15). We deduce growth rules for the passage from k 
to k + 1 leaves for the limiting trees equipped with strings of beads (Corol- 
lary 16). These are remarkably simple and consist of picking a bead (using 
Proposition 10) and crushing the bead of size Sf~, say, into rrik+i, where 
mk+i/sk ~ PD(a, 0), arranging these as a new string of beads (using Corol- 
lary 7), attaching them to the location of the bead, which now splits an edge 
and the remainder of its string of beads into two, as illustrated in Figure 3. 

In the (a, 1 — a) case, growth by crushing beads is closely connected to 
growth rules for random recursive trees studied by Dong, Goldschmidt and 
Martin [6]. Specifically, we can associate with TZk & tree with k vertices 
labeled by [k] and infinitely many unlabeled vertices, all marked by weights; 
let V\ consist of a root labeled 1 and infinitely many unlabeled children 
marked by the sequence m\ of masses of the string of beads on 1Z\] to 
construct Vfc + i from Vjt, identify the unlabeled leaf in marked by the size 
of the chosen bead Sk, label it by k + 1 and add infinitely many children 
of vertex k + 1, marked by the sizes rrik+i of the crushed bead. The limit 
Voo is a recursive tree where all vertices have infinitely many children. We 
show in this paper that the richer structure of (1Zk,Hk), that includes edges 
on which the atoms of Hk are distributed, has a binary CRT as its limit. 
In fact, can be constructed for general (a, 9), but the purpose of [6] 
was to establish a coagulation-fragmentation duality that only works for 
(a, 1 — a). See also Blei, Griffiths and Jordan [5] for another application 
of nested Chinese restaurant processes to define distributions on infinitely- 
deep, infinitely-branching trees. 

Section 4 will establish CRT limits for the general (a, 9)-tree growth pro- 
cess. 

Theorem 3. In the setting of Proposition 2, there exists a CRT T a,e on 
the same probability space such that we have for the delabeled trees TZ^ '° , 
k>l, associated with 1Z^ 6 ' , k > 1, that 

'° — ► T a,e almost surely as k — > oo, in the Gromov-Hausdorff topology. 

In fact, CRTs such as T a,e are equipped with a mass measure /i. We 
can construct [i as the limit of the strings of beads that we constructed on 
IZfc' 9 ' [see Corollary 23], using Evans' and Winter's [9] weighted Gromov- 
Hausdorff convergence that we recall in Section 4.1. 

It would be nice to replace the two-step limiting procedure of Proposition 
2 and Theorem 3 for trees reduced to k leaves, letting first n — > oo and then 
k — > oo, by a single statement: 
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Conjecture 1. In the setting of Proposition 2, we have 
n ~arpa,e,o _^ q-a,e almost surely, as n — > oo, 
for the Gromov-Hausdorff topology. 

In [18] we used exchangeability to obtain fine tightness estimates and 
establish convergence in probability for a wide class of exchangeable strongly 
sampling consistent Markov branching trees. From this result we deduce a 
convergence in distribution in the weakly sampling consistent cases 9 = 1 — a 
and 9 = 2 — a, but without sampling consistency, this argument breaks down. 

Our method to prove Theorem 3 uses an embedding of (T"' e ,n > 1) and 
(7^*' ,k>l) in a given fragmentation CRT. For a rooted R-tree {T,p) and 
leaves Ei, . . . , E& of T, denote by R(T; Ei, . . . , E&) the smallest subtree of T 
that contains p and Ei, . . . , E&. The family of binary fragmentation CRTs 
T is parameterized by a self-similarity parameter a > and a dislocation 
measure v(du), a sigma-finite measure on [1/2, 1) with Jj x / 2 ^(1 — u)u(du) < 
oo; see Section 4.1. 

Theorem 4. Let (T a,e , p, p) be a binary fragmentation CRT with root 
p and mass distribution p, associated with dislocation measure v a ^{du) = 
fao( u ) du, 1/2 < u < 1, where 

r(l - a)f° j6 (u) = a(u (l - u)- a ~ l + u- a -\l - u) e ) 

+ 9{u e ~\l - U y a + u~ a (l - u) 9 ' 1 ) 

for some < a < 1 and 9 > 0. Then there exists, on a suitably extended 
probability space, a sequence (E„,?t, > 1) of random leaves ofT a ' e , such that 
(R(T a,e ; Ei, ... , Efe), k > 1) has the same distribution as (TZ^ 6 , k>l). 

With this embedding, the projection of the mass distribution p of T a ' e 
onto R(T a,e ; Ei, . . . , E&) yields strings of beads with distributions as we 
constructed them on lZ^ e . See Proposition 20. 

2. An ordered Chinese Restaurant Process and regenerative composition 
structures. 

2.1. Regenerative compositions. We recall in this subsection some back- 
ground on regenerative composition structures from [13]. A composition of n 
is a sequence (m, . . . , n^) of positive integers with sum n. A sequence of ran- 
dom compositions C n of n is called regenerative if, conditionally given that 
the first part of C n equals n\, the remaining parts of C n define a composition 
of n — n\ with the same distribution as C n _ ni . Given any decrement matrix 
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(q(n,m), 1 < m < n), there is an associated sequence C n of regenerative ran- 
dom compositions of n defined by specifying that q(n, •) is the distribution 
of the first part of C n . Thus, for each composition (ni, . . . ,n k ) of n, 

F(C n = (m, . . -,n k )) 

(2) 

= q(n, ni)q(n - n 1 , n 2 ) ■ • • q{n k ^i + n fc , n k _i)q(n k ,n k ). 

We regard a composition of n as a distribution of identical balls in an or- 
dered sequence of boxes. For a sequence of compositions (C n ,n > 1), let C n 
denote the composition of n obtained by removal of a ball chosen uniformly 
at random from C n +i, and discarding the empty box if the chosen ball is the 

only one in its box. We call (C n , n > 1) weakly sampling consistent if C n = C n 

for every n, and strongly sampling consistent if (C n ,C n +i) = (C n ,C n +i) for 
every n. A detailed theory of the asymptotic behavior of weakly sampling 
consistent sequences of regenerative compositions of n (known as composi- 
tion structures) is provided in [13]. 
Now write 

k 

C n = {N nj i,N n> 2, • • • , N n>Kn ) and let S nyk = ^ N n j, 

3=1 

where N n j = for j > K n . According to Gnedin and Pitman [13], if (C n , n > 
1) is weakly sampling consistent, there is the following convergence in distri- 
bution of random sets with respect to the Hausdorff metric on closed subsets 
of [0,1]: 

(3) {S n , k /n,k>0} Z:={l-exp(-&),t>0} cl , 

n — >oo 

where the left-hand side is the random discrete set of values S n k rescaled 
onto [0, 1] , and the right-hand side is the closure of the range of 1 minus the 
exponential of a subordinator > 0). If (C n ,n > 1) is strongly sampling 
consistent, then the convergence (3) holds also with convergence in distri- 
bution replaced by almost sure convergence. The collection of open interval 
components of [0, 1] \ 2 is then called the regenerative interval partition 
associated with (C n ,n > 1). In particular, a strongly sampling consistent 
composition structure can be derived from Z by uniform sampling in [0, 1] 
using Z to separate parts. 

The distribution of a subordinator > 0) is encoded in its Laplace 
exponent $ as 

E( e -^) = e"'* (s) where $(s) = a + cs+ [ (1 - e- sx )A(dx), 

J(0,oo) 

for all s > 0, t > 0, and characteristics (a, c, A), where a > 0, c > and A is 
a measure on (0, oo) with /( Ooo )(l Ax)A(dx) < oo. 
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2.2. An ordered Chinese Restaurant Process. We will now use an ordered 
version of the CRP to construct an exchangeable random partition Tl a) g of 
N governed by the CRP as described in [24], jointly with a random total 
ordering of the blocks (tables) of H a ,e- With a suitable encoding that we 
make precise, this random total ordering is independent of IIq,^. 

First recall the (a, 9) CRP for fixed < a < 1 and 9 > 0. Customers 
labeled by N := {1,2,...} seat themselves at tables labeled by N in the 
following way: Customer 1 sits at table 1. Given that n customers have been 
seated at k different tables, with rij customers at table i for i € [k], customer 
n + 1 

• sits at the ith occupied table with probability (m — a)/(n + 9), for i £ [k] ; 

• sits alone at table k + 1 with probability (ka + 6)/(n + 6). 

The state of the system after n customers have been seated is a random 
partition II n of [n]. By construction, these partitions are exchangeable, and 
consistent as n varies so they induce a random partition Iloo of N whose 
restriction to [n] is H n . 

When a = 1, 11^ consists of all singleton blocks since no customer ever 
sits at an occupied table. So we assume henceforth that < a < 1. Basic 
facts are that the block of associated with table j has an almost sure 
limiting frequency Pj, and that the Pj may be represented as 

(4) (Pi,P 2 , ■■■) = (Wi,WiW 2 ,W 1 W 2 W 3 , . . .), 

where the Wi are independent, W{ ~ beta(l — a, 9 + ia) and Wi := 1 — Wi. 
Note that the proportions (Pi,P 2 , ...) are in a size-biased random order, 
corresponding to the fact that the table numbers label the blocks of Iloo in 
order of their least elements. 

Another basic fact, read from [24], is that the number K n of occupied 
tables after n customers (number of blocks of Il n ) has the limiting behavior 

(5) K n /n a ^L a e = r(l - a) lim j(Ph a for < a < 1, 

j—*oo J 

where (Pj,j > 1) is the ranked sequence of proportions (Pj,j > 1), and 
L a Q is a random variable with the tilted Mittag-Lemer distribution with 
moments 

(K\ w(in ^ r(fl + l) T(9/a + n + l) 

(6) E(L ^ ) = r(fl/a + l) T(9 + na + l) 

This L Qj is the local time variable associated with a regenerative PD(a,#) 
interval partition of [0,1], also called its a-diversity. For a = 0, we have 
K n /\og(n) — ► 9 almost surely. 

We now put a random total order < on the tables as follows. Indepen- 
dently of the process of seating of customers at tables, let the tables be 
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ordered from left to right according to the following scheme. Put the second 
table to the right of the first with probability 9 /{a + 6) and to the left with 
probability a/(a + 9). This creates three possible locations for the third 
table: put it 

• to the left of the first two tables with probability a /{2a + 8); 

• between the first two tables with probability a/ (2a + 9); 

• to the right of the first two tables with probability 8 /(2a + 9). 

And so on: given any one of k\ possible orderings of k tables from left to 
right, there are k + 1 possible places for the (k + l)st table to be squeezed 
in. The place to the right of all k tables is assigned probability 9/(ka + 9); 
each of the other k places is assigned probability a/(ka + 9). 

Let crfc(z) denote the location of the iih table relative to the first k tables, 
counting from 1 for the left-most table to k for the right-most. So a k is 
a random permutation of [k]. The sequence of permutations (cr k ,k > 1) is 
consistent in the sense that if a k (i) < (Jk(j) for some k>iVj: = m&x{i,j}, 
then the same is true for all k > i V j. Thus, the sequence (a k , k > 1) specifies 
a random total order on N, call it the table order. Given a\, .. .,<J k , 

• Ok+i(k + 1) = k + 1 with probability 9/(ka + 9); 

• a k+ \(k + 1) = i with probability a/(ka + 9) for each i € [k] 

and 

(7) o- k+1 (j)=a k (j) + l(a k+1 (k + l)<a k (j)) for j 6 [k]. 

Thus, by construction, (a k , k>l) is independent of the (unordered) random 
partition Iloo of N, with 

F(a k = n )= K 

[9/a\ k 

for each permutation ir of [A;], where 

[x] k = x(x + 1) • • • (x + k - 1) = T(x + k) /T(x) 

and 

k 

R(tt) := ^2 l( 7r i > TTi for all 1 < i < j) 

i=l 

is the number of record values in the permutation tt. Note that for k > 2 the 
distribution of a k is uniform iff a = 9. The formulas apply as suitable limit 
expressions: if a = and 9 > 0, tables are ordered in order of appearance 
and a k is the identity permutation (there is only one table for 9 = 0); if 
< a < 1 and 9 = 0, the first table remains right-most, and the a k is uniform 
on permutations with tt(1) = k. See [21, 23] for related work. 
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In the sequel, we will repeatedly use generalized urn scheme arguments, so 
let us briefly review the main points here. See [22] and [24], Section 2.2, for 
references. Recall that the distribution of a random vector A = (Ai, . . . , A m ) 
with Ai + • • • + A m = 1 and density 

3Ai,...,A TO _i(a;i, • • • ,v m -i) 

_ r( 7l + --- + 7?n ) ! 7m _ x -i n s 7m _i 

" r( 7l )---r( 7m ) Xl ^ (1 Xl 

on {(xi, . . . ,x m -i) :x\, . . . } x m —i > 0,xi + - ■ -+a; m _i < 1} is called the Dirich- 
let distribution with parameters 71, ... , 7 m > 0. 

Lemma 5. (i) Consider a weight vector 7 = (71, . . . ,7 m ) and a process 

(H^ , n > 0) with = ; where = (H[ n ^ , . . . , ) evolves according 
to the updating rule to increase by 1 a component chosen with probabilities 
proportional to current weights 7 + : 

F ( H (n+l) = #(n) + e . I #(1) ; ^ #(n) ) 

a.s. ; * = 1, . . . , m, 



71 H h 7 m + n 

where ej is the ith unit vector. Then In A ~ Dirichlet(7i, . . . , 7 m ) 

n^oo 

and 

P( J H'( n+1 )=ii-( n )+e i | J H"W,... )J ffW,A) = A i a.s., i = l,...,m, 

which means that the components of increase are conditionally independent 
and identically distributed according to the limiting weight proportions A. 

(ii) A vector A ~ Dirichlet(7!, . . . ,7 m ) can be represented as 

(Ai, . . . , A m ) = (Wi.WWa, WiW 2 W 3 , 

w x ■ ■ ■W m - 2 W m - 1 ,W l ■ ■ ■W m . 2 W m . l ), 

where the Wi are independent, Wi ~ beta(7i,7i+i + • • • + 7m) and Wi := 
l-Wi. 

If 7 € N m , the process H arises when drawing from an urn with initially 
7i balls of color i, always adding a ball of the color drawn. 

2.3. Thecomposition of table sizes in the ordered Chinese Restaurant Pro- 
cess. Let II n denote the random ordered partition of [n] induced by ordering 
the blocks of II n according to o~k„, where K n is the number of blocks of II n . 
Let C n denote the random composition of n defined by the sizes of blocks of 
II n . If C* is the sequence of sizes of blocks of Ii n , in order of least elements 
(or table label), and K n = k, the jth term of C* is the (jfc(j)th term of C n . 
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Proposition 6. (i) For each (a, 9) with < a < 1 and 9 > the se- 
quence of compositions (C n ,n > 1) defined as above is regenerative, with 
decrement matrix 

( n \ na-ma + m9 [1 - a] m _i 

(8) g a £i(n,m)= -. [l<m<n). 

y ' ' K ' \mj n [n-m + 6] m K ' 

(ii) This sequence of compositions (C n ,n > 1) is weakly sampling consis- 
tent, but strongly sampling consistent only if a = 9. 

(iii) Let S n j be the number of the first n customers seated in the j left- 
most tables. Then there is the following almost sure convergence of random 
sets with respect to the Hausdorff metric on closed subsets of [0, 1] : 



(9) {S n>j /n,j > 0}^2 a ,e ■■= {1 - exp(-&),i > 0} cl 



where the left-hand side is the random discrete set of values S n j rescaled 
onto [0,1], and the right-hand side Z a ^ is by definition the closure of the 
range of 1 minus the exponential of the subordinator (£t,i > 0) with Laplace 
exponent 



(10) 



_ , , sT(s + 9)T(l-a) 

® ag (s) = —i ; for9>0 and 

' v ; ro + + i-a) 
r( s + i)r(i-a) 



r (s + l-a) 



(iv) Also, ifL n (u) denotes the number of j G {1, . . . , K n } with S n j/n < u, 
then 

(11) lim sup \n~ a L n (u) — L(u)\ = a.s., 

n ~*°°ue[o,l] 

where L := (L(u),u € [0, 1]) is a continuous local time process for Z a ^, mean- 
ing that the random set of points of increase of L is Z a ^ almost surely. 

Note. Various characterizations of L can be given in terms of Z a g and 
£. See below. 

Proof of Proposition 6. (i) That (C n ) is regenerative is proved by 
induction on n. The case n = 1 is trivial, and if (C m , 1 < m < n) is regener- 
ative, then, by the seating rule, three scenarios can occur. Given customer 
n + 1 sits alone at a new first table, the remaining composition C n is trivially 
distributed as C n . Given customer n + 1 sits down at the existing first table 
of size ni, the induction hypothesis implies that the remaining composition 
is distributed as C n — ni , as required. Given customer n -\- 1 sits neither at 
a new first nor at the existing first table of size ni, the seating rules are 
such that he chooses his seat in the remaining composition as if he were 
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customer n — n\ + 1 for composition C n - ni , and the induction hypothesis 
allows to conclude that the resulting composition of n — n\ + 1 is distributed 
as C n - n , 1+ i, as required. 

Denote by q(n,m) the probability that the first block in C n is of size m. 
Then, the seating rules imply that 

m — 1 — a n + 6 — m 

q(n + l,m) =q(n,m-l) — \-q(n,m) — — 

. , n + n + 

12 

a 

+ — -^l{ m =i}, l<m<n+l, 

where q(n, m) = for m > n or m < 0. It is enough to check that the matrix 
given in (8) solves (12) for m>2, that is to show 

n + 1 \ na + a — ma + m8 [1 — a] m _i 



m J n + 1 [n + l — m + 0], 



n 

m — 



+ 



n 



na — ma + a + m6 — 9 [1 — a] m -2 m — 1 — a 
n [n — m + 1 + 0] m -i n + 6 

na — ma + m9 [1 — a] m _i n + 9 — m 



m) n [n — m + 6] m n + 6 

Obvious cancellations reduce this to 

n{na + a — ma + mO) = m{na — ma + a + m9 — 6) 

+ {n+1 — m) [na — ma + m9) , 

which is easily verified. The decrement matrix (8) was derived in [13], Section 
8, as that associated with the unique regenerative composition structure 
whose interval partition of [0, 1] has ranked lengths distributed according 
to the Poisson-Dirichlet distribution with parameters (a, 6). Thus, formula 
(8) gives the decrement matrix of a weakly sampling consistent family of 
regenerative compositions. 

(ii) Weak sampling consistency was a by-product of the proof of (i) . Let 
us show that (C n ,n > 1) is strongly sampling consistent if and only if a = 6. 
It is known that the compositions induced by independent uniform variables 
separated by the zero-set of a (2 — 2a)-dimensional Bessel bridge have the 
dynamics of the Chinese Restaurant Process with seating plan (a, a) and a 
uniform block order. Also, this construction using a Bessel bridge generates a 
strongly sampling consistent composition structure. On the other hand, the 
ordered version of the Chinese Restaurant Process also induces a uniform 
block order for a = 0. Conversely, calculate the following probabilities: 

P(C 2 = (1,1)) = ^±|, P(C 2 = (2)) 



P(C 3 = (2,1)) 



1 + 0' v w/ l + i 
(a + 2fl)(l-a) 
(l + 0)(2 + 0) ' 
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and note that strong sampling consistency requires 

- rM , C , = (2,l))- ( " + 2S>(1 - Q) 



(l + 6>)(2 + #) v z w ' v ' " (l + 0)(2 + 0) 3 
<^=> a = 9. 

(iii) Now (3) yields convergence in distribution in (9), and (10) was derived 

in [13], formula (41). To get the almost sure convergence in (9), observe that 

In) 

for each i > 1, the proportion P- of customers at the ith table in order of 
appearance corresponds to the size of a gap in {S n j/n,j > 1} and converges 

to P{ almost surely as n — > oo. As for the gap (g\ , D^ n ) itself, where 
= Gf 1 ' + P/ n ^ , a simple argument allows to also deduce almost sure 
convergence as n —* oo , 

C oo 
r {n) _ >=>n,a Kn {i)-l _ ^ (n)-. 

n 2-~i j 1 {°"jViO')<cr J vi( i )} 

i=i 

oo 

— *■ $Z ^J"'"{ o 'jvi0')< CT jVi(j)} =: 
5=1 

and, hence, — > Gj + Pj =: -Dj, using the consistent construction of the 
sequence (cr^jk > 1) and the almost sure convergence of frequencies of all 
classes of IToo . 

In particular, on a set of probability one, the following holds. For each e > 
the locations of all gaps of length Pj > e converge, and a simple argument 
shows that we can find no > 1 such that, for all n > no, 

B({S nJ /n,j > l},e) D {G*, A,» > 1} and 

B({G u Di,i> l},e) D {S nJ /n,j > 1}, 

where P(5, e) = {i£ [0, 1] : |x — y| < e for some y G 5} for any Borel set S C 
[0, 1]. We deduce the almost sure Hausdorff convergence of (9). Cf. the arXiv 
version [11] of [12] for a similar argument. 

(iv) As for convergence of local time processes, the convergence (5) of 
L n (l)/n a = K n /n a to L(l) equal to the a-diversity of the limiting PD(a, 0) 
is established in [24]. Look next at a time u in the random interval (G\,Di) 
associated with the first table. The dynamics of the table ordering imply 
that the numbers of tables to the left of the first table develop according to 
the urn scheme associated with sampling from a beta(l, 0/a) variable Pi^/a 
which is independent of L(l). It follows that for u in (Gi, D\) there is almost 
sure convergence of L n (u)/n a to (5ifi/ a L{l). Similarly, if we look at the first 
k tables, and count how numbers of following tables fall in the k + 1 gaps 
they create, we see the dynamics associated with sampling from a Dirichlet 
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distribution with its first k parameters equal to 1 and the last equal to 9 /a; 
cf. Lemma 5. As k — > oo, the associated cumulative Dirichlet fractions are 
almost surely dense in [0,1]. It follows that we get a.s. convergence in (11) 
for all u in the random set of times \Jj>i{Gj,Dj), and that the countable 
random set of a.s. distinct limit values from these intervals is a.s. dense in 
[0, L(l)]. The conclusion then follows by a standard argument; cf. [15]. □ 

It is worth recording some consequences of this argument. 

Corollary 7. The collection of intervals 

for (Gj,Dj,j > 1) created from the size-biased frequencies (Pj,j > 1) and 
the independent sequence of random permutations (o~k,k > 1) specified in (7) 
provides an explicit construction of a regenerative (a, 9) interval partition of 
[0,1]. 

Corollary 8. Construct a random interval partition of [0,1] as fol- 
lows. Let (Gi,D\) be such that the joint law of (G\,D\ — Gi,l — D±) is 
Dirichlet(a, 1 — a, 9) for some < a < 1 and 9 > 0. Given {G\,Di), let this 
be one interval component, let the interval components within [0, G\] be ob- 
tained by linear scaling of a regenerative (a, a) partition, and let the inter- 
val components within [D\,l] be obtained by linear scaling of a regenerative 
(a, 9) partition. Then the result is a regenerative (a, 9) partition. 

Proof. It is clear by construction that the split of table sizes into 
those to the left of table 1, table 1, and those to the right of table 1 is 
a Dirichlet(a, 1 — a, 9) split (cf. Lemma 5), and that given this split the 
dynamics of the composition to the left of table 1 and the composition to 
the right of table 1 produce limits as indicated. The conclusion now follows 
from the proposition. □ 

The particular cases 9 = a and 9 = of Corollary 8 are known [23] , Propo- 
sition 15. If 9 = 0, then (Gi,D\) = (G±, 1) is the last component interval of 
[0, 1] \ Z a fi where Z a $ can be constructed as the restriction to [0, 1] of the 
closed range of a stable subordinator of index a. It is well known that the 
distribution of G\ is then beta(l — a,a), and that the restriction of Z a ,o to 
[0, G\] is a scaled copy of Z a ^ a which can be defined by conditioning Z a $ on 
1 £ Za.,o- Otherwise put, Z a ^ and Z a ^ a can be constructed as the zero sets 
of a Bessel process and standard Bessel bridge of dimension 2 — 2a. In the 
bridge case, (Gi,D{) can be represented as the interval covering a uniform 
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random point independent of Z a ,a, an d (Gi,D\) splits Z a ^ a into rescalings 
to [0, G\] and \D\, 1] of two independent copies of itself. 

As indicated above, the local time process (L(u),0 <u<l) can be de- 
scribed directly in terms of £ or Z a ^: in the setting of Proposition 6, we 
have 

(13) L(l-exp(-f t ))=r(l-a) f exp(-a£ s ) ds; 

Jo 

cf. [14], Section 5. The right-continuous inverse of L satisfies 

rl Ah 

(14) L- 1 (i) = l-exp(-C T J whereT, = r(l-a) ^ — — -— . 

In fact, (1 — L~ 1 (£),0 < I < L(l)) is a self-similar Markov process killed 
when reaching zero, so (13) and (14) are Lamperti's formulas [20] relating 
self-similar Markov processes and Levy processes. This observation will tie 
in nicely with well-known properties of self-similar fragmentations that we 
introduce in Section 4.1. Furthermore, we will use the Stieltjes measure dL~ l 
as a discrete measure on [0,L(1)] to turn this interval into a string of beads 
in the sense of Definition 4. 

2.4. Finding the first table in the composition of table sizes. Let (LT n ) 
be the sequence of random ordered partitions of n induced by the ordered 
CRP, and C n the regenerative composition structure of block sizes of II n 
studied in Proposition 6. According to (2), for each particular composition 
(ni,...,n e ) of n, 

P(C n = (ni,...,r^)) = p a ,o( n ii • ■ ■ i n t) 

(15) 

e e 
'■= II 1a,e{Nj,nj) with Nj := ^ m 
j=i i=j 

for q a fi as in (8). Now, for each 1 < k < £, we wish to describe the conditional 
probability given this event that the first customer sits at the kth. of these 
tables, which has size n^. 

Lemma 9. In the random ordered partition II n of [n], given that the 
left-most block in this ordered partition is of size n\, the probability that it 
contains 1 is 

< 16 > ^TN^ (*:-»-»,). 

Given that the composition C n of block sizes of Ii n is (m, . . . ,ni), for 1 < 
k < I the conditional probability that 1 falls in the kth block of size is 

(17) P t ] 11(1 -Pf) forpf = l i9 with N j+1 := f n, 

j=l U 3 V "T iv j+l« i=j+1 
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In particular, if 8 = a, then Tl n is exchangeable, and the size of the block 
containing 1 is a size-biased pick from the composition C n of block sizes. 

Proof. It is enough to describe the conditional probability, given that 
the first block has size n\, that this block contains 1. For given that this 
block does not contain 1, the dynamics of the ordered CRP are such that 
the remainder of the ordered partition n n , after order-preserving bijective 
relabeling (keeping label 1 fixed), makes a copy of II n _ ni . The probability 
that the first block has size m is found from (8) to be 

/ -i o \ / v (n - 1\ [l-a] ni _i (m0 + N 2 a) 

(18) qa ,e(n, ni )={ ^ j [Q + NiU (N 2 :=n- ni ) 

for 1 < m < n. In particular, for n\ = n, the probability that there is just 
one block, [n], is [1 — a] n __i/[l + 6] n -i. This can also be seen directly from 
the sequential construction of the Chinese Restaurant. The denominator is 
the product of all weights for n — 1 choices, and the numerator is the product 
of weights for each new customer sitting at the same table as all previous 
ones. The same direct argument shows that the probability that 1 ends up 
in the left-most block along with n\ — 1 other integers is 

n — 1\ [1 - a] ni _i[0]jv 2 



(19) ^2 J [l + 9] 7 

where the first factor is the number of ways to choose which of the n — 1 
integers besides 1 are not in the first block, and, whatever this choice, the 
factors [1 — a] m _i and [0]n 2 provide the product of weights of relevant 
remaining choices, and the denominator is the product of total weights. 
Look at the ratio of (19) and (18) to conclude. □ 

The case 9 = deserves special mention. The probability of creating a 
new table to the right of the first k tables is always zero. The effect of this 
is that 1 always remains in the right-most block of the ordered partition. 
Formula (16) in this case must be interpreted by continuity at 9 = 0, to give 
for 1 < n\ < n — 1 and 1 for n\ = n. This case is exceptional in that the 
size of the right-most table of the ordered restaurant has a strictly positive 
limiting proportion of all customers, with beta(l — a, a) distribution. This 
can be read, for example, from (4). 

In all other cases the proportion at the right-most table converges almost 
surely to zero, as a consequence of (3). If a > 0, the fraction in the left-most 
table tends to 0. If a = and 9 > 0, the fraction in the left-most table has 
a limiting beta(l,#) distribution. 

As n tends to infinity, the rescaled compositions C n become a limiting 
interval partition Z a ^. Let us now study which interval of Z a ^ is the limit 
of the block containing 1. 
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Proposition 10. Let 6 > 0. Given Z a j = {1 - exp(-£ t ),t > 0} cl , the 
conditional probability for the interval (1 — exp(— £t-) 5 1 — ex P( — £t)) to be 
the limit of the block containing 1 is 

P(e- A ^)l[(l - p{e~ A ^)) withp{x) 

s<t 

for allt>0 with := & - > 0. 



(1 — x)6 + xa 



Proof. For the random ordered partition Ii n = (II n (l), . . . , 
and u € (0, 1), we deduce from Lemma 9, in the notation of Proposition 6, 
that 



F(ien n (L n ( u ))\zz >e )=pM n (i 



L„(u)-1 



P 



a.s., 



3=1 



where Z™ e := {S n j/n, j > 0} -> -Z^g = [0, 1] \Uiex(ffi> almost surely, with 
respect to the Hausdorff metric on closed subsets of [0,1]. We will refer to 
intervals Ii = (gi,di) as parts of Z a Q. Denote g n (v) = sup{w <v:w£ Z™ e } 
and d n (v) = inf{u; > v : w G 2™^} for v £ (0, 1), similarly, g(v) and for 
-2^0. For each fixed v G (0, 1), we have 

(d„0) - 9n(v))6 



(n) 

(d» - 5n (w))0 + (1 - d n (^))a 
(d(«)-^(t;))fl 



■Pg(v) 



a.s. 



(d(u)-<7(t;))0 + (l-d(t;))a 

Now fix e > 0, then there is M so that there are ("big") parts I±, . . . , Im of 
Z a fi that leave less than 9e/8R uncovered, where R = (1 — d(u))a. Using 
the a.s. convergence of left and right end points, a standard argument now 
shows that there is iVo > such that, for all n > No, 



Mp£V))+ E iog(i-^ n) ) 



L„(u)-1 

E 

3=1 

log(p fl ( u )) - 



E iog(i - p 9i 

i£X:gi<g(u) 



since 
log 



(1 - d{v))a 



< 



(d(v) - g(v))e ^ (d(v) - g(v))9 



(l-d(v))a 



< 



(d(v) - g(v))6 + (1 - d(v))a 
allows to jointly bound the sums of all small parts. Therefore, 

L n (u)-1 

n - 



1 — d(u))a 



F(ieU n (L n (u))\z^)=p { [ l \ u) n 



3=1 
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^Pg(u) II 0-~Pai) a - s ' 
i€l:gi<g(u) 

Now we use dominated convergence to deduce for any bounded continu- 
ous function / on the space of closed subsets of [0, 1] (equipped with the 
Hausdorff metric) that 

E(/02*,)P(1 e fl n {L n {u))\Z n afi )) ^{f{Z afi ) Pa{u) J] C 1 ~Pm 

iel:gi<g(u) 

However, we also have 

E(/(^)l {l6 g n(in(u))} ) - W(Z a ,e)Hue( GM }) 

= K(f(Z at0 )F(ue(G 1 ,D 1 )\Z aj e)), 

since the distributions of Gi and D\ are continuous or degenerate {G\ = 
or D\ = 1) by Corollary 8. We identify 

(20) Pg{u) n o--p S i) 

iel:gi<g(u) 

as a version of the conditional probability P(u £ (Gi, D\)\Z a g) for all u € 
(0,1). 

Finally, conditionally given Z at g, each of the countable number of times t 
such that £t- < £t is associated with an interval (1 — exp(— £t_), 1 — exp(— £t)) 
of it- values to which (20) applies, so the conditional distribution of (G\,Di) 
given Z a g is as claimed. □ 

The limiting interval in Z a ^ of the block containing 1 corresponds to a 
jump of the (for 9 = killed by an infinite jump at an exponential time e) 
subordinator £. Denote the time of this jump by r. It can now be checked 
directly that the boundary points (1 — exp(— £ T _),1 — exp(— £ T )) describe 
a Dirichlet(a, 1 — a, 6) split of [0,1] as shown in Corollary 8. Standard 
thinning arguments for the Poisson point process (A£j,t > 0) show that 

£ T _ = where £° is a subordinator independent of r with Levy measure 
(1 — p(e~ x ))A a fl(dx) and Laplace exponent 

pod 

$„(«)= / (l-e- sx ){l-p{e- x ))A afi (.dx) 
Jo 

so that 

E( e - s ^-)=/ e- m °^Xe~ xt dt = — , 

Jo ®o(s) + \ 

where A = T(l — a)T(6 + 1)/T(9 + 1 — a) is the rate of the exponential vari- 
able r. 
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For simplicity, let 9 > 0. The case 9 = is similar, taking into account 
the killing at the infinite jump. We find the Levy measure A aj g(dx) of £ 
from $ a , fl (s) = / (0)OO )(l - e^A^dx) with $ given in (10) (cf. also 
[13], formula (41)) and change variables u = e~ x to get 

$ (s) = $ Q e ( s ) -e f\l- u s )u e - l {l - u)~ a du 
Jo 

= sB(s + 9,1 -a)- 9(B(9, 1 - a) - B(s + 9, 1 - a)) 
= (s + 9)B{9 + s, 1 - a) - A, where B(a, 6) = r(o)r(6)/T(a + 6), 
and, hence, 

F r -^-i = g r(fl)r( 5 + g + i-a) 
1 J + s r(0 + i-a)r( s + e)' 

These are the moments of a beta(a, 9 + 1 — a) distribution in accordance 
with Corollary 8. Similarly, A£ T has distribution 

jp(e- x )A a) e(dx) 

and so the interval size exp(— £ T _)(1 — exp(— A^ r )) relative to the remaining 
proportion exp(— £ T _) can be seen to be independent of exp(— £ T _) and 
to have a beta(l — a, 9) distribution. By Lemma 5(b), this establishes the 
Dirichlet(a, 1 — a, 6) distribution of Corollary 8. 

3. Markov branching models and weighted discrete M-trees with edge 
lengths. 

3.1. Markov branching models. Our formalism for combinatorial trees 
follows [18], Section 2. For n = 1, 2, . . . , let T° denote a random unlabeled 
rooted binary tree with n leaves. The sequence (T°,n > 1) is said to have the 
Markov branching property [2, 10] if conditionally given that the first split of 
T° is into tree components whose numbers of leaves are m and n — m, these 
components are like independent copies of and T°_ m , respectively. The 
distributions of the first splits of T°, n > 1, are denoted by (q°(m,n — m), 1 < 
m < n/2) and referred to as the splitting rule of (T°, n > 1). 

For a finite set B, let be the set of binary trees with leaves labeled 
by B. For T n G Tr n i and B C [n], let T nt B € be the reduced subtree of 

T n spanned by leaves in B, and let T Ut B £ T[#_b] be the image of T Ut B after 
relabeling of leaves by the increasing bijection from B to [#-£>]• It will be 
convenient to label each branch point of T n by the set of leaf labels in the 
subtree above the branch point. A tree T n £ Tr n i is then uniquely represented 
by a collection of subsets of [n]. Such a tree has the natural interpretation 
as a fragmentation tree, where blocks (i.e. labels of branch points, [n] for 
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the first branch point) fragment as one passes from one level to the next. 
We will write B € T n if T n has a vertex with label B. 

Proposition 11. Let (T" ,e ,n > 1) for some < a < 1 and 6>0 be an 
(a,6)-tree growth process as defined in Definition 3. Then: 

(a) the delabeled process (T"' e, °,n > 1) has the Markov branching prop- 
erty with splitting rule 

q°(m, n — m) = q a fi{n — 1, m) + q a ,e{ n — 1, n — m), 1 < m < n/2, 
q°(n/2, n/2) = q a fi(n — 1, n/2), if n is even, 

where q a ,e(n,m) is given in (8); 

(b) the labeled process (T®' 9 ,n > 1) is regenerative in the sense that for 
each n > 1, conditionally given that the first split of T®' e is by a partition 
{B, [n] \ B} of [n] with j^B = m, the relabeled subtrees T^' B and T^ n ^ B are 

independent copies of T^f andT^_ m , respectively. 

Proof. For notational convenience, we drop superscripts a, 9. Recall 
from the Introduction the identification (1) of leaf k + 1 of (T n ,n > 1) and 
customer k of the regenerative composition structure (C n ,n > 1) of the or- 
dered Chinese Restaurant Process described in Proposition 6, for all k > 1. 
This identifies C n _i as the composition of subtree sizes growing off the spine 
from the root to leaf 1. In particular, we see that for each n > 2 the distri- 
bution q° stated here applies as splitting rule at the first branch point of T n 
and indeed on the spine of T n . 

To establish the Markov branching property, proceed by induction. T^ , 
T2 and T3 trivially have the Markov branching property. Assume that the 
property is true for T°, . .. ,T° for some n > 3. Then, by the growth proce- 
dure, two scenarios can occur. Given n + 1 attaches to the trunk, the subtrees 
of T° +1 are T° and the deterministic tree with single leaf n+ 1, they are 
trivially conditionally independent and, by the induction hypothesis, have 
distributions as required. Given n + 1 attaches in one or the other subtree 
of T° of sizes m and n — m, the induction hypothesis yields the conditional 
independence and Markov branching distributions for these subtrees, and 
also yields that the insertion of a new leaf into one of these trees gives the 
corresponding Markov branching distribution of size m + 1 or n — m + 1, 
respectively, by the recursive nature of the growth procedure. 

This proves (a). The induction is easily adapted to also prove (b). Just 
note that the (a,0)-tree growth rules are invariant under increasing bijec- 
tions from B to [#B]. □ 
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3.2. Sampling consistency and the proof of Proposition 1. Recall that a 
sequence of trees (T°,n > 1) is weakly sampling consistent if uniform random 
removal of a leaf of T° +1 yields a reduced tree with the same distribution as 
T°, for each n > 1. 

For (T°' e ,n > 1) with splitting rules q°(m,n — m) as before (with m < 
n — m), to match notation with Ford [10], Proposition 41, introduce the split 
probability functions 

• q hia,s (x, y) defined so that q hias (m,n — m) = q a ^(n — l,m) [see (8)] is the 
probability that [n] is first split into pieces of size m and n — m, for 
1 < m < n — 1, where we are supposing that the piece of size m does not 
contain label 1; so q hias (x,y) = q a ,o(x + y — l,x); 

• q sym (x, y) = ^g bias (x, y) + ^q hias (y, x) for the symmetrization of g bias . Then 

h 

q hias (x,x) for all x > 1. 



we have g sym (x, y) = for all x < y and q sym (x,x) = q°(x,x) 



Ford uses symmetrized splitting rules to grow unlabeled planar trees. For us 
they are useful for a weak sampling consistency criterion: let 

y) := y) ( 1 - ^('■' + »)+^(' + ».') 

V a; + y + 1 

- q s ^(x + l,y) X + - ^ m (x, y + 1) 



x+y+1 x+y+1 

Ford [10], Proposition 41, showed that (T°) is weakly sampling consistent if 
and only if eP ym (x,y) = for all positive integers x and y. He verified this 
property for the (a, 1 — a)-trees. 

PROOF of Proposition 1(c). For the (a, 6) splitting rules we obtain 
d sym (l,l) =d sym (l,2) = 0, 

but 

[1 - a)(l - a - 0)(2 -a- 6»)(3 - a + 0)(a + 6>) 



(21) d sym (l,3) 



lO(l + 0) 2 (2 + 0) 2 (3 + 0) 



which shows that a necessary condition for (T°) to be weakly sampling 
consistent is that equals either 1 — a or 2 — a. Ford showed that = 
1 — a produces weakly sampling consistent trees. The proof of part (c) of 
Proposition 1 is completed by the following lemma. □ 



Lemma 12. For 9 = 1 — a and 6 = 2 — a, the symmetrized splitting rules 
are the same. Therefore, the (a, 2 — a) tree growth process is weakly sampling 
consistent. 
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Proof. For convenience of notation in this proof, denote the nonsym- 
metric splitting rules, for Ford's case 9 = 1 — a by 

p (n — 1 \ to + (n — 1 — 2m)a T(m — a)T(n — to — a) 

q n (to, n — m) — ' 



m J n-1 r(l - a)F(n - a) 

[see (8)], and for 9 = 2 — a by 
x (n — 1 \ 2to + (n — 1 — 2m)a T{m — a)T(n — to + 1 — a) 



q x (m,n-m) = 

Now the claim is that 



n-1 r(l-a)r(n + l-a) 



\q X (m, n — to) + \q X (n — to, to) = ^q F (m, n — m) + ^q F (n — to, to), 
which after the obvious cancellations is equivalent to 

(n — m)(2m + (n — 1 — 2m)a)(n — m — a) 
+ m(2n — 2m + (2to — n — 1)q;)(to — a) 
= (n — m)(m + (n — 1 — 2m)a)(n — a) 
+ m(n — to + (2to — n — l)a)(n — a), 
and this is easily checked. □ 

The nonsymmetrized rules are equal only if a = 1, trivially, since this is 
the deterministic comb model, where all leaves connect to a single spine. 
In fact, it can be shown that these coincidences of symmetrized splitting 
rules are the only such coincidences, in particular, for fixed a, the splitting 
rules as a path in the space of splitting rules, parameterized by 9 > 0, have 
precisely one loop. 

Let us turn to strong sampling consistency and exchangeability. 

Proof of Proposition l(a)-(b). Assume that (T n ,n > 1) is strongly 
sampling consistent for some 9 S {1 — a, 2 — a}, then it is not hard to show 
that also the regenerative composition structure (C n , n > 1) generated by the 
associated ordered Chinese Restaurant Process is strongly sampling consis- 
tent. By Proposition 6, this implies 9 = a and, hence, 9 = a = 1/2. On the 
other hand, it is well known that this case is strongly sampling consistent. 
This establishes part (b) of Proposition 1. 

Part (a) of Proposition 1 is easily checked for n = 3. The shape Tg is 
deterministic, as there is only one rooted binary tree with three leaves. This 
tree has one leaf at height 2 and two leaves at height 3. Denote the label of 
the leaf at height 2 by M. Then exchangeability requires 

I = p( m = 2) = JL * „_i 
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and for = 1/2, 

1 ™/ \ a 2a 1 

-=PM = 3 = - = — => a = -, 

3 v ; 1 + 9 3 2' 

using the growth rules. This completes the proof of Proposition 1. □ 

We conclude this subsection by a study of boundary cases. For a = 1, 
we have a comb model (all leaves directly attached to a single spine) with 
nonuniform labeling (for 9 = 1, leaves 2,3,... are exchangeable, and for 9 = 0, 
leaves 3,4,... are exchangeable), but strongly sampling consistent as the 
delabeled trees are deterministic. The trees grow linearly in height. 

For a = 0, we get a tree growth model that one might call internal bound- 
ary aggregation on the complete binary tree in a beta(l,#) random environ- 
ment. Informally, attach n + 1 to T n at the terminal state of a walker climbing 
the tree by flipping the beta(l,0) coin corresponding to each branch point 
until he reaches a leaf of T n . Insert re + 1 by replacing the leaf by a new 
branch point connected to the leaf and re + 1. 

More formally, let X = Un>o{0' 0" be the complete rooted binary tree, 
where {0, 1}° = is the empty word and elements of {0, l} n are identified as 
binary words of length re. Mark all vertices of X by independent beta(l,0) 
random variables W x , x S X. Consider the binary tree growth process with 
edge selection rule as follows: 

(i) w Let a walker start from Zq = [re], with Xq = (for k = 0), with steps 

as in (ii) 1 ^. 

(ii) w Given T n and a word Xk, let X^+i ~ Bernoulli(Wx fe )- If -^fc+i = 1 and 

has children B and Z/ t \B, where B contains the smallest label 
of Zk, set Zk+\ = B, otherwise Z^+i = Z^\B. If j^Z^+i > 2, repeat 
(ii) w . Otherwise select edge Zk+i = {L n+ \}. 

In our formalism where T n is a collection of subsets of [re], the growth step 
can be made explicit as T n+ \ = {B U {n + 1} : L n+ \ € B € T n } U {B : L n+ \ ^ 
£€T n }U{{L n+1 },{re + l}}. 

Proposition 13. (a) The family (T n ) n >i grown via (i) w -(ii) w is a 
(0,9) -tree growth process. 

(b) The labeling ofT n , n>3 is not exchangeable for any 9 > 0; the trees 
are weakly sampling consistent if and only if9 = or 9 = 1 or 9 = 2; the trees 
grow logarithmically (except for 9 = 0, when the model is the comb model and 
growth is linear). 

Proof, (a) This follows directly from the growth rules of the (0,#)-tree 
growth process, since internal edges are never selected for insertions. The 
first branch point separates 1 and 2. At this branch point, and inductively 
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every other branch point, an urn scheme governs the selection procedure, 
with initial weight 1 for the subtree of the larger label, 9 for the subtree of 
the smaller label, so a beta(l,#) limiting proportion of insertions will take 
place in the subtree of the larger label; cf. Lemma 5. 

(b) The exchangeability claim follows easily from the growth procedures. 
Weak sampling consistency can be read from (21), which also holds for 
a = 0. Logarithmic growth follows from the following considerations. Just 
as we argued for < a < 1 in the Introduction, also for a = 0, the height K n 
of leaf 1 in T n has the same dynamics as the number of tables in a Chinese 
Restaurant Process with (0,9) seating plan. In this case K n is known to 
grow logarithmically, with K n /\og(n) —> 9 if 9 > 0. It is easy to see that also 
the rescaled height of leaf k converges to 9. □ 

Note that the height of the branch point between any two leaves j and k 
is constant, hence converges to zero when rescaled by log(n). Therefore, in 
a logarithmically scaled limit tree all leaves would be adjacent to the root 
with no further branching structure. 

3.3. Weighted discrete M-trees with edge lengths. A pointed compact 
metric space (T,d,p) is called a compact M-tree with root p € T if it is 
complete separable path-connected and has the tree property: 

• for any a, a' € T there is a unique isometry g a ^ : [0, d(a, a')] —> T such that 
fiWr'(O) = o- and g a ,a'{d(a,a')) = a'; denote [[a, a'}] = 5 CT)(T '([0, d(a, a')]); 
furthermore, any simple path from a to a' has range [[a, a']]. 

In this section we restrict our attention to M-tree representatives of discrete 
trees with edge lengths such as T n G Tj„] with edge lengths es £ (0,oo), 
B € T n , where e# refers to the parent edge below B, so e[ n ] is the length of 
the root edge. For B £ T n with ancestors [n] = Bq D B\ D ■ ■ O = B in 

T n , we denote its birth time by Ib = e# H V &B h _ x and its death time by 

r B = Ib + £b- Recall, for example, from [7] that we can associate a real tree 



{1,4} 



"— © 



<D 



{1,2,4} 



|5J 



<D 



*-p=([5],0) 



{3,5} 



<D 



e {3,5} 



<D 



r {3,5} 



Fig. 4. 



Canonical representation of a tree Ts with edge lengths eg, 6£Ts. 
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as a subset of T n x [0, oo) as 



(22) 



T = {([n],0)} U {(£, s):BeT n , S e (J B , r B ]} 



in canonical form, so that -Eb ■ = B x (Ib,^b] represents the edge below B 
of Euclidean length e# = tb — cf. Figure 4. We refer to T n as the shape 
of T. We define the root p= ([n],0) and a metric d on T that extends the 
natural Euclidean metric on the edges and that connects the edges to a tree. 
If o = (B,s) £ T, then we set d(p,a) = s. Let a' = (B',s') G T\ {p}. We 
define d(a,a') by 



here the first case is when B n B' = 0, that is, there is a branch point, the 
last common ancestor B V B', for which I? is in one subtree and B' in the 
other. 

A weighted M-tree is equipped with a probability measure p on the Borel 
sets of (T, d). As a relevant example consider an interval partition Z C [0, 1] 
with local time (L(u),0 <u< 1). We can associate a real tree consisting 
of a single branch [0, L(l)] and specify p by its distribution function L^ 1 , 
that is, p([0, L(u)]) =u. We visualize the atoms of different sizes lined up on 
[0,L(1)] (particularly if they are dense, but also if they are not dense) as a 
string of beads and use this term to refer to the weighted interval; cf. Figure 
3 in the Introduction for a tree composed of strings of beads. In this spe- 
cific single-branch context we have a natural notion of convergence, namely, 
weak convergence of Stieltjes measures dL~ l as measures on [0, oo), where 
the interval [0, L(l)] is determined by the supremum of the support of the 
measure. In this sense, Proposition 6 easily yields the following convergence 
of strings of beads: 



(23) ([0,n- Q L n (l)],d(n- a L n )- 1 )^([0,L(l)],(iL- 1 ) weakly a. s. 



In general, M-trees can have features such as a dense set of branch points 
(cr £ T such that T \ {a} has three or more connected components) and 
allow diffuse weight measures on an uncountable set of leaves (a € T such 
that T \ {a} is connected). We will introduce a suitable space of M-trees 
and the weighted Gromov-Hausdorff notion of convergence in Section 4.1, 
self-similar fragmentation trees will be introduced as relevant examples. 



d(a, a') = < 



' d(p, a) + d(p, a') - 2r BxjB , , 

if BVB':= p| B" i{B,B'}\ 



B"eT n :BUB'CB" 



\d(p,a) - d(p,a')\, 

otherwise, that is, if B C B' or B' C B; 
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3.4. Convergence of reduced trees and the proof of Proposition 2. Re- 
call from the Introduction our notation R(T n ; [k]) for the reduced tree, the 
subtree of T n spanned by leaves labeled [k] and equipped with the graph 
distances in T n as edge lengths. We now associate an M-tree via (22). Propo- 
sition 2 claims that the (a, #)-tree growth process (T n ,n > 1) has the asymp- 
totics 

n~ a R(T n , [k]) — > IZk in the Gromov-Hausdorff sense as n — > oo 

for some limiting discrete R-tree TZk with random edge lengths and precisely 
k leaves labeled by [k]. To describe the distribution recursively, we will use 
notation Sj? = {(B, lg)} U {(A, s) € TZk '■ A C B} for the subtree of TZk above 
B. In the following Proposition 14 we prove a refinement of Proposition 2 
that includes a mass measure pk on the branches of TZk ■ 

Definition 5. Let (S, dk\s,Pk\s) be a closed connected subset of (TZk,dk, 
Pk) with mass m = Pk{S) > and root (B,so) given by B = U(a,s)gs A, 
so = min{s : (A,s) € S}. Then we associate the relabeled, scaled and shifted 
tree (S,d,Jl) as the canonical form (22) of the tree S with edge lengths mul- 
tiplied by m~ a , labels changed by the increasing bijection from B to [#B], 
mass measure pushed forward via these operations and then multiplied by 
m . 

Once we have embedded TZk as a subtree of a CRT (T, p,fJ,), the atoms 
of the mass measure f^k will correspond to the //-masses of the connected 
components of T\TZk projected onto TZk- More formally, for any two M-trees 
IZgT with common root p E TZ, there is a natural projection 

■n n :T g p , a (sup{t > 0:g p ,a(t) £ TZ}), 

where g p>CT : [0, d(p, a)] — > T is the unique isometry with g PtCT (0) = p and 
9pu(d(p,a)) = a. For a measure p on T, we denote the push-forward via 
vr* by 

Tr?p(C) = {^)~ 1 {C), C e B(TZ) := {D C TZ:D Borel measurable}. 

Denote by v n the empirical (probability) measure on the leaves of the M-tree 
representation of T n with unit edge lengths. We refer to v n as mass measure 
of T n . 

Proposition 14. Denote by (T n ,n > 1) an (a, 0)- growth process as de- 
fined in Definition 3. 

(a) Let < a < 1, 9 > and k > 1. We have, as n — > oo, that 

(24) (n- a R(T n ,[k}),7r^ Tn ' [k]) u n )^(TZ k ,Pk) weakly a.s. 

in the sense that for all 2k — 1 edges the strings of beads converge a.s. as in 
(23). 
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(b) Let < a < 1 and 9 > 0. The distribution of (TZ k ,/i k ) is determined 
recursively as follows. (7£i,jUi) = (i?m,/ii) is an {a, 9) -string of beads. For 
k>2, (lZ k ,[i k ) has shape T k and the first branch point splits (7Z k ,[i k ) into 
three components: a trunk and two subtrees. Conditionally given that T k first 
branches into {B, [k] \ B} with 1 6 [k] \ B and j^B = m, the following four 
random variables are independent: 

• (Hi,H 2 ,H 3 ) = ([i k (E [k] ),fi k (Sj3 ),fi k (S k k ^ B )) ~Dirichlet(a,m-a,£;-m- 

• the scaled and shifted trunk (Er k i,]l k lk] ) is an (a, a) -string of beads; 

• the relabeled, scaled and shifted subtree (S k ,jl k ) is distributed as (1Z m , 

• the relabeled, scaled and shifted subtree {S k k ^ B ,a4 S ) as (R- k ~ m , n k - m ) ■ 

Proof. The proof is an extension of the proof of [18], Proposition 18. 
The case k = 1 was established in (23). Now fix k > 2 and T k . Assume, 
inductively, that the proposition is proved up to tree size k — 1. For n> k, 
the reduced trees (R(T n , [k]), ir^ Tn '^v n ) all have the same shape as T k . In 
the transition from n to n + 1, mass increases by 1, and there may be no 
change of the reduced tree, or one of the edge lengths may increase by 1. 

Let us first just distinguish the weights of the trunk below the first branch 
point and the two subtrees above, of sizes m and k — m, say. We can associate 
three colors with the three components. It is easy to see that the mass 
allocation behaves like an urn model. The (a,#)-tree growth rules specify 
initial urn weights of a, m — a and k — m — 1 + 9. Hence, these are the 
parameters of the Dirichlet distribution of limiting urn weights (H\,H2, H3); 
cf. Lemma 5. 

Now we can treat separately the evolution of the three components, condi- 
tionally given {H\,H2,H%). See the proof of [18], Proposition 18, for details 
of this argument, which gives us the claimed independence. 

The trunk follows the dynamics of an (a, a) ordered CRP (when restricted 
to the proportion H\ of leaves added in this part of the tree) whose limit- 
ing behavior was studied in Proposition 6 and (23). By the recursive na- 
ture of the growth procedure, the two subtrees have the same dynamics as 
(R{T n , [m]),7rf ( ' Tn '^' ) ^ n ) and (R(T n ,[k - m]),7r^ Tn ^ k ~ m ^u n ), respectively, 
(when restricted to the proportions H2 and H3 of leaves added to these 
parts), and the induction hypothesis establishes their limiting behavior. □ 

Proof of Proposition 2. Joint convergence with mass measures in 
Proposition 14(a) implies convergence of the trees without mass measures, 
so the proof of Proposition 2 is complete. □ 
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The result in (b) is still true for 9 = 0, if interpreted appropriately. In 
fact, leaf edges with zero edge weight disappear in the limit of (a). It is 
now implicit in the above description that the limits of the associated leaves 
are on branches of the limiting tree. They are not leaves themselves. In 
particular, the first split is not necessarily at the first (topological) branch 
point of (7Zk,fJ-k), but (for m = k — 1) may be leaf 1 on the branch leading 
to the first (topological) branch point. If so, it is this splitting the recursive 
description describes, with zero mass proportion for the degenerate subtree 
containing 1 (zero third parameter for the Dirichlet distribution). 

3.5. Growth of (Hkit^k) by bead crushing. The recursion can be partially 
solved to give the distribution of {TZk^k) more explicitly. Specifically, stan- 
dard Dirichlet calculations [e.g., using Lemma 5(b)] show that the mass 
splits introduced by the branch points on the spine from the root to 1 lead 
to Dirichlet mass splits with parameter 9 for the edge adjacent to 1, pa- 
rameter a for all other spinal edges and parameter m — a for every subtree 
with m leaves. When applying the recursion in a subtree off the spine with 
m leaves, we have m — a = m — 1 + 9 only if 9 = 1 — a, so only in the 
(a, 1 — a) case, the overall mass split edge by edge is Dirichlet distributed, 
Dirichlet (a, . . . , a, 1 — a, . . . , 1 — a) with a for the n — 1 inner branches and 
1 — a for the n leaf edges. For 9 ^ 1 — a, we get a mass split edge by edge 
that is best described recursively. Regarding the mass distribution on edges, 
we note: 

Corollary 15. In the setting of Proposition 14, conditionally given T k 
and an edge-by-edge split 

(Li k {E B ),B£T k ) = {h B ,B£T k ), 

the components (Eb, Hk\E B ) are independent and such that (Es,d^ B ,Jj^ b ) 
is an (a,a)-string of beads for j^B > 2 and an (a, 9) -strings of beads for 
#5 = 1. 

Since the Dirichlet mass proportions induced by the split at the first 
branch point are independent from the three rescaled components in Propo- 
sition 14(b), the (a,#)-tree growth rules can be formulated conditionally 
given the Dirichlet limit variables as independent sampling from the limit 
proportions (cf. Lemma 5). Furthermore, we can deduce edge selection rules 
for (TZ k , l^k) that are analogous to (i) rec and (ii) rec and indeed (i) and (ii), 
for general (a, 9). 

Corollary 16. Let 9>0. Then ((1Z kl [i k ),k > 1) is an inhomogeneous 
Markov chain starting from an (a,9)-string of beads (TZi,fii) = (E^iy,/j,i), 
with transition rules, as follows: 
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(i) n Given {K k ,/Xfc), assign weight (jl^Eb) to the edge in (7Z k ,[i k ) labeled 

B,BeT k . 

(ii) 7 ^ Select B k G T k at random with probabilities proportional to the weights. 

Select a bead (J k ,m k ), where J k = (B k ,s k ) £E Bk and m k = fi k ({J k }) 
as in Proposition 10 using (a, 9) -selection if #B k = 1 and (a,a)- 
selection if f^B k > 2 on the string of beads (EB,Ji k B ) associated to 
(Eb, n k \E B ) by shifting and scaling. 

To create !Z k +i from lZ k , remove from lZ k bead (J k ,m k ) and attach in J k 
the m k -scaled and s k -shifted image (ifc+i, fi Ih+1 ) of an independent (a, 6)- 
string of beads (I k +i, M k+1 ). Relabel to include k + 1 so as to obtain Tl k +i 
in canonical form (22): 

K k+ i = {{A U{k + 1}, s) : (A, s) G TZ k ,s < s k ,B k C A} 
U / fc+ i U {(A, s) G lZ k : s > s k or B k £ A} 
M fe+ i(C) = fi k ({(A, s)£K k \ {J k } : (A U {k + 1}, a) G C}) 

+ fi Ik +i(C n J fc+ i) + fj, k ({C n (K k \ { J k })). 

Proof. ((7Z k , fi k ),k > 1) is an inhomogeneous Markov chain because 
(7Z k+ i, /i/c+i) fully determines (lZ k , fj, k ), . . . ,(lZx, fj,±). To identify the transi- 
tion rules, fix k > 1. The proof is by induction on the steps in the recursive 
growth rules. The induction step consists of proving the recursive version of 
the growth rules (i)^ and (ii)^: 

(i) ^ c Given (K k ,fj, k ) with first split {B, [k] \ B}, with 1 G [k] \ B and 

#B = m, assign weights (/j, k (E^), fi k {S^ ), Hk(sf ^ B )) to the three 
components, that is, the trunk and the two subtrees above the first 
branch point. 

(ii) rec Select a component at random with probabilities proportional to the 

weights. If a subtree with two or more leaves was selected, recursively 
apply the weighting procedure (i)^ c to the selected subtree. Oth- 
erwise, denoting the selected edge or the unique edge in the selected 
subtree by Es k , select a bead (J k ,m k ), where J k = (B k ,s k ) G Es k and 
m k = l^k({Jk}) as in Proposition 10 using (a, #)-selection if #B k = 1 
and (a, a) -selection if j^B k > 2 on the string of beads (EB k ,Ji k k ) 
associated with (E Bk , ^ k \E Bk ) by shifting and scaling. 

To prove that this recursive scheme produces the same distributions as the 
limiting procedure in Proposition 14(a) that defines (lZ k ,7Z k+ i), we study 
the independence properties in the proof of Proposition 14. The urn scheme 

(a + H[ n) ,m - a + H^ n) ,k - m - I + 6 + H^ n) ), n>k 
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starting from = (H[ k , b\ , H$ ) = (0,0,0) interacts with the growth 
of edges and mass measures on the subtrees only by setting the number of 
steps, so that by stage n, this growth will have exhibited H\ steps according 

in) (n) 

to the rules of ordered CRP and and H$ steps, respectively, according 
to the recursive growth rules for the subtrees, irrespective of (H^> ,k <i <n). 
As n — > oo, we obtain independence of three components Ci, 62,63, the 
(a, a)-string of beads C\ = (Et k i , jl k ) and the relabeled, scaled and shifted 
subtrees C2 = (S^,Ji^) and C3 = («s| k , mL ) horn the sigma-algebra Ti 
generated by ((H[ n) , H { ^ ] , ') , n > k). 

On the other hand, if Hj k+1 ^ = 1, then leaf k + 1 is inserted in the jth 
component, j = 1,2,3, so this selection is 7Y-measurable and hence inde- 
pendent of (Ci, C2, C3). Standard results on urn schemes (Lemma 5) yield 
that 

P(i2f +1) = l\(n k ,fi k )) = P(H { 3 k +1) = l\(H 1 ,H 2 ,H 3 )) = H j a.s. 

Inductively, this argument shows that the conditional probability given 
(lZk,Hk) of inserting k + 1 at edge Eb of 72.^ is fik(Es) a.s. and that, con- 
ditionally given this edge selection, the growth on that edge follows a CRP, 
when restricted to insertions to that edge. In particular, the bead selection 
is done according to Proposition 10, with parameters (a, 6) if #B = 1 and 
(a, a) if #B > 2; cf. Corollary 15. The insertion rule creates Eij.+i} with 
distribution as identified in Corollary 15. □ 

If EB k is an internal edge, the PD(a,a) composition structure is strongly 
sampling consistent and, in fact, we select a new junction point Jk with 
weights proportional to fj,/- restricted to Es k ■ 

For 8 = 0, the discussion before Proposition 10 shows that the bead se- 
lection in an (a,0)-string of beads always selects the last bead at the leaf. 
Crushing this bead creates a new string of beads but does not split the 
string the bead was selected from hence creating a degenerate subtree, which 
should contain the leaf edge leading to the smallest label, say, 1, for simplic- 
ity noting that this occurs recursively for all other labels also, but this edge 
has zero length and, in particular, no more beads. If we use the canonical 
representation (22), there will be no point ({l},s), s > 0, in TZ k , k>2, and 
the "leaf" 1 is actually equal to J\, a pseudo-branch point whose removal 
creates only two connected components. Below this point, 1 is in the label 
set, above it, 1 is removed from the label set. 

3.6. Moment calculations for lengths and masses. Focusing particularly 
on the case k = 2 and = 1 — a, denote by J\ = ({1, 2}, ra n) the branch 
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point and by Ej = ({i}, r {i}), i = 1, 2, the leaves. Then the joint distribution 
of lengths 

(25) d 2 (p,J x ), d 2 (Ji,^i), d 2 (Ji,E 2 ) 

was described already in [18], Proposition 18. These are dictated by the 
asymptotics of urn schemes embedded in the (a, 1 — a)-tree growth process. 
In the previous subsection, we described these branch lengths jointly with 
the masses 

(26) ^ 2 ([[p,Ji]]), MPi,£i]D, M2(Pi,E 2 ]]) 

and the restrictions of p 2 to the three branches. In the (a, 1 — a) case, 
Proposition 14(b) identifies the joint distribution of the sextuple (25) and 
(26) in terms of the Dirichlet(a, 1 — a, 1 — a) distribution of masses (26), 
and 

d 2 (p, Ji) = p 2 ([[p, Ji]}) a S ; d 2 (Jx, SO = /i 2 ([[Ji, Si]]) Q 5i; 

(27) 

d 2 (J 1 ,Z 2 )=p 2 ([[J 1 ,Z 2 ]}) a S 2 ; 

where the So, Si and 5" 2 are independent a-diversities (or local times) asso- 
ciated with {a, 6) interval partitions with parameters 9 = a, 9 = 1 — a and 
9 = 1 — a, respectively. It could be checked by a joint moment computation 
that this is consistent with the alternative description of the lengths without 
the masses which was provided in [18], Proposition 18: 

ch{p, Ji) = D \(K 2 ); d 2 (Ji, SO = D 1 \(K 2 ); 

(28) 

d 2 (Ji,E 2 ) = £> 2 A(ft 2 ); 

where \(TZ 2 ) denotes the total length of 1Z 2 and [Do,D\,D 2 ) has a Dirichlet(l, 
(1 — a)/a,(l — a) /a) distribution, independent of \(7Z 2 ) is distributed as 
the a-diversity of an (a, 2 — a) interval partition. To illustrate, the first 
description (27) gives 

Tff/j i rw B(a + as,2-2a) T(a + l)T(s + 2) r(a + l)r(2-a) 
!L(d 2 {p, Ji) ) =- 



B(a,2-2a) r(2)r(a + sa + 1) T(2 + sa-a) ' 
whereas the second description (28) gives 

E(d / j i)S) _B(l + s,2/a-2) r(3-a)T(2/a + s) _T(s + 1)T(2 - a) 



B(l,2/a-2) r(2/a)r(3 + sa-a) T(2 + sa-a) 

The above discussion, together with the location of masses along the arms 
according to appropriate regenerative PD(a,9) distributions, with masses 
located at local times, fully determines the law of (7Z 2 ,p 2 ). What remains 
to be seen is how (JZ 2 ,p 2 ) can be embedded in the CRT. 
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4. Embedding in continuum fragmentation trees. Throughout this sec- 
tion we assume < a < 1, since there are no CRTs (in the sense of the next 
subsection) associated with a = and a = 1 (cf. the discussion at the end 
of Section 3.2). 

4.1. Continuum fragmentation trees. We defined weighted M-trees in Sec- 
tion 3.3. Let us follow Evans and Winter [9] to introduce a notion of con- 
vergence on the space T wt of weight-preserving isometry classes of weighted 
R-trees. Here, two weighted R-trees (JZ,v) and (T, //) are called weight- 
preserving isometric if there exists an isometry i:lZ^T with i*v = \jl the 
push- forward of measure v under the isometry. Informally, the notion of con- 
vergence consists of weak convergence of probability measures and Gromov- 
Hausdorff convergence of the underlying tree spaces. See also Evans et al. [8] 
for Gromov-Hausdorff convergence of unweighted R-trees and Greven, Pfaf- 
felhuber and Winter [16] for an alternative type of convergence for weighted 
R-trees. 

More specifically, it is shown in [9] that the distance function 
A GH wt((ft,z/),(T,/i)) 

= inf{e>0:3 /€F ^ T)ffe i?|. K tip(/*z/,/i) <e and d P (y,g*n) < e} 

gives rise to a Polish topology on T wt (although A GH wt is not itself a metric), 
where 

Flr={f:U^T: sup \d n (x,x') - d T (f(x), f(x'))\ < e\ 

set of e-isometries, 
d P {fi, yl) = inf{e : V Cc rcioscd < f^'({x € T : d(x, C) < e}) + e} 

Prohorov distance. 

Note that convergence of the form (23) for strings of beads and, based on 
this, (24) for sequences of weighted discrete trees with edge lengths and con- 
stant combinatorial shape imply convergence in the sense defined here. How- 
ever, this notion of convergence also allows convergence to trees with more 
complicated branching structure such as continuum fragmentation trees. 

We will further use this notion of convergence to establish projective limits 
of subsets of a CRT, where the measures on the subsets are just projections 
of the CRT mass measure. The following elementary lemma will be useful. 

Lemma 17. Let 1Z C T be two M-trees, /x a measure on T and v = n*fi 
the push-forward under the projection map n :T — > 1Z. Then 

A GH wt ((K, u), (T, /i)) < ci H aus(T) (K, T) 

for the HausdorjJ distance dHaus(T) on compact subsets of T. 
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Proof. Just consider the projection map g = ir and the inclusion map 
/ : K -»• T, then for e = d Uans{r) (11, T), we have / G F^ T , g G F^- n , d P (u, g*fi) = 
and dp (f*v, p) = dp(V, p) < e. □ 

A random weighted rooted binary M-tree (T, d, p, p) is called a binary 
fragmentation CRT of index 7 > 0, if 

• p is nonatomic a.s. assigning positive weight to the subtrees T a = {a' G 
T:a G [[p 5 o"']]} for all nonleaf <7 G T, and zero weight to all branches 
[[/O, <r]], for all a G T, and 

• for all i > the connected components (7^*, z > 1) of {a G T : d(p, a) > t}, 
completed by a root vertex pi, are such that given (p(Tl),i > 1) = (rrii, i > 
1) for some mi > m2 > • • • > 0, the trees 

C 7 i t 5 m l ~ 7d lr t >Pi 1 TO i~Vlr t ) 5 * > !> 

are like independent identically distributed isometric copies of (T, d, p, p) . 

Haas and Miermont [17] and Bertoin [3] observed the following. Given (T, d, p), 
let £* be a random point in T chosen according to p, and define the mass 
of the tagged subtree above t as 

s * = { K 7 ?), if S * 6 ^ for some * > 1. 
' \ 0, otherwise. 

Then (S%,t > 0) is a decreasing self-similar Markov process in [0, 1] starting 
from Sq = 1 and attaining S% = in finite time, which can be expressed as 

SI = exp{— £t(£)} where T(t) = inf |n > : J exp{— 7^*} dr >t 

and is a subordinator, called the spinal subordinator, with Laplace expo- 
nent 

$(s)= / (l-e- sx )A*(dx) 

J(0,oo) 

for some Levy measure A* on (0, 00) with /( Ooo )(l Ax)A*(dx) < 00 that 
characterizes the distribution of the binary fragmentation CRT. A jump 
^T(t) = x corresponds to a change of mass S* = S^_e~ x by a factor of e~ x 

at height t, so consider the push-forward A*(du) of A* via the transforma- 
tion u = e~ x . It will be assumed in the following discussion that A*(dx) = 
X*(x)dx for some density function A*(x), so that A*(du) = uf*(u) du for 
some density function /* on (0, 1) which is related to A* by 



(29) 



f*(u) = u- 2 \*(-\ogu) (0<«<1). 
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The introduction of the size-biasing factor u is done since the normal pa- 
rameterization of fragmentation trees is by their dislocation measure 

v{du) = l {u > 1/2} f*{u)du. 

The size-biasing factor u then arises because in our context of binary frag- 
mentations, /* is necessarily symmetric, meaning /*(it) = — it), and 
given a mass split (it, 1 — u) with u < 1 — u, the mass of the randomly tagged 
fragment is multiplied by u with probability u and by 1 — u with probability 
1 — u, but then the total rate for a ranked split (it, 1 — it) with it > 1/2 is 
again uf*(u) + (1 - u)/*(l - u) = /*(it). 

Because is a subordinator, {1 — exp(— Q),t > 0} cl is a regenerative 
interval partition in the sense of Section 2.1. 

Proposition 18 (Spinal decomposition [4, 19]). Consider a fragmen- 
tation CRT (T,d,p) and a random leaf X* € T whose distribution given 
(T,d,p) is p. Then the spinal decomposition theorem holds for the spine 
[[p, £*]] in the following sense. Consider the connected components {%,i € /) 
ofT\ [\p, £*]], each completed by a root vertex pi. Denote by p* the random 
discrete distribution on [[p, £*]] obtained by assigning mass m-i = p{%) to 
the branch point base point of% on [[p, £*]]■ Then given the string of beads 
([\p, £*]], p*), the trees 

(T^mpdlT^Phm^plqr), iel, 

are independent identically distributed isometric copies of (T, d, p) . 

4.2. (q, 6) -dislocation measures and switching probabilities. From Propo- 
sition 14 we have [a, #)-trees (7Zk,Pk) which are based on weakly sampling 
consistent regenerative Poisson-Dirichlet compositions. We can compare this 
with sampling k leaves £*,..., Ej£ according to p in a CRT (T, p) giving rise 
to reduced fragmentation trees 

k 

^=ll[[p,£*]], /4 = A, 

which can be thought of as being based on strongly sampling consistent 
regenerative compositions that are not of Poisson-Dirichlet type [by Propo- 
sition 6(h), the unique regenerative Poisson-Dirichlet interval partition is 
not strongly sampling consistent unless a = 6 = 1/2] . 

Let us first compute the appropriate dislocation measure for (T,p). We 
have a (spinal-to-be) subordinator £ with Laplace exponent c^ a ^ given by 
(10), with Levy measure 

K a fi{dx) = \ a>e (x)dx, 
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where A Q ,e(x) = ca(l - e -*)-<*-i e -V+i)* + cOe'^il - e~ x )- a . Here, c is 
a constant that was irrelevant in the context of Section 2, but that we 
will choose appropriately here. In analogy with (29), we can compute the 
intensity uf a> g(u) of (e~ A ^*) as 

fa,e(u) = u~ 2 \ afi {- log(u)) = ca(l - n)""" V" 1 + c9u 6 ~ 2 {l - u)~ a , 

which is nonsymmetric and, for a split (u, 1 — u) with u > 1/2, gives a rate 
of 

= uf a ,e(u) + (l-u)f a) e(l-u) 

= ca(u e (l - n)-"" 1 + (1 - ufu- - 1 ) 

+ ce{u -\i - u y a + (i - u) e_1 u- Q ). 

We now check that the choice c = 1/T(1 — a) is such that 

^([1/2,1 -e]):= f £ f°, e (u)du~ £ - 

Ji/2 ' 1(1 -a) 

which is the condition established in [18] to obtain the associated CRT as 
limit in discrete approximations scaled by n a as in Proposition 14, but in 
the weakly sampling consistent case. 

We can now compare subordinators £ with Levy measure and the 
spinal subordinator £* in a CRT (T, jj) with dislocation density f° e . Recall 
also [14, 18] that the regenerative interval partition associated with the 
spinal subordinator admits a natural local time process as in (13) and 
(14), which is such that the spinal string of beads ([[p, £*]],//*) is of the form 

d(p,T,*)=L*(l) and 

(30) 

ti*({g P ,x*(L*(l - e-€t))}) = e-&-* - e"« = -Ae~«. 

In particular, we can identify the height L*(l — e - ^) in the tree of an atom 
of fj,* that corresponds to a jump of at time r. 

For the proof of Theorem 4, we will embed (TZk,Pk), k > 1, in the CRT 
(T,/i) of index a and with dislocation density This involves solving 
several problems: 

• How do we embed {1Zi,pi) in (T, //)? Can we make leaf Si of 1Z\ close to 
leaf of 7£* by having their spines coincide initially? Part of the problem 
is then to identify the point where the spines separate. 

• Can we iterate the procedure by following the exchangeable leaf with the 
smallest label £*. off the spine of 1Z\, and pass to a limit i — > oo to 
identify 1Z\ as a subset of T? 

• Once we have yui), how do we find the point where the spine of leaf 
£2 leaves the spine of leaf Si? 
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• Can we iterate this to embed all (JZk,Hk) in (T,/x)? 

Outside a CRT, we solved the third bullet point in Proposition 10 and 
obtained a coin-tossing representation in the sense that we climb up the 
spine tossing a coin for each of the (infinite number of subtree) masses and 
stopping the first time we see heads. The heads probability depends on the 
relative remaining mass u after a split and can be given as a switching 
probability (away from relative size u to relative size 1 — u) 

P( u ) = n ~ \a^° — where u = exp(-A&). 
(1 — u)t> + uot 

See also Corollary 16 for the iteration for k > 2. Although we endeavor to 
embed (7^1,/ii) in (T,/i), it is instructive to first try to embed (7Z*,nl) in 
(lZ n ,/i n ). Assuming for a moment that 1Z n C T and \x n = tt^/j,, then as 
a pick from /i is projected onto Ef n = 7r^ n (SJ), a pick from /i n . 

Lemma 19. (a) Given (TZi,fii), a pick S* 1 from is obtained by 
switching probabilities p*(u) = 1 — u: given (TZi,fi\) is associated with a 
spinal subordinator £, the conditional probability that T,\ 1 falls into the block 
(1 — e _ £'~ , 1 — e - ^') of the associated interval partition is 

s<t 

(b) Let 9 > 0. Denote the switching time in (a) by r. Given (7^,/i*) and 
a measurable switching probability function (p(u),0 < u < 1) with associated 
switching time r, we obtain 

(31) (6,0<t<r)i(er,0<t<f) 
if and only if 

(32) p(u) = ( 1 - u )M^-u) , dmost allQ<u<1 

Proof. For (a) just note that 

(1 _ e -6) _ (i _ e -6-) = e -5*-(i _ e -A€t) = (! _ e -A6) JJ e " A?s . 

For (b), note that the killed subordinator < t < r) can be described in 
terms of two independent Poisson point processes of points e _A ^' with tails 
coin toss at intensity measure (1 — p* (u))uf a g(u) du and jumps with heads 
coin toss at total intensity f, 1 ^p*(u)uf ol! e(u) du. 

Similarly, the killed subordinator (Q ,0 < t < f) has the same descrip- 
tion with tails intensity measure (1 — p(u))uf* e (u) du and heads intensity 
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1(0 i)P( u ) u fa e( u ) d u - It is an elementary computation to show that the tails 
intensity measures are equal if and only if p(u) satisfies (32) and that then 
also the heads intensities coincide. □ 

For 9 = 0, the subordinator £ with Laplace exponent (10) has an infinite 
jump A£ e = oo at an exponential time e with parameter l/T(l — a), while £* 
does not. The calculation in the proof is still true, except that the possibility 
of an infinite jump was ignored. Consequently, for (31) to hold, r must be 
replaced by f A e for an independent exponential time e with parameter 
l/r(l — a), that is, 

(33) (6,0<t<T) = (£,0<t<?Ae). 

Note that e is not a jump time of £*. 

4.3. Embedding (JZk,Pk) an d the proof of Theorem 4- We now carry out 
the program outlined in the previous subsection and iterate the embedding 
started in Lemma 19 to construct an unkilled Poisson point process (Ft,t > 
0) and then {TZ\,pi): 

• Let (T, d, p, p) be an a-self-similar fragmentation CRT with dislocation 
density f° e . 

• Define {F^ l \d^ l \ p^ l \ p^) := (T,d,p,p) and consider the spinal subordi- 
nator £*W of a random point S^ 1 ^ sampled from p^ in . Perform the 
construction of Lemma 19(b) and denote by r^ 1 -* the associated switching 
time, also put = 0. Define 

F t = exp(-A^ (1) ) for < t < r« F rW = 1 - exp(-A£g). 

For 9 = 0, when = r Ae in (33), terminate the construction if r^ 1 ' = e. 

• For i > 1, denote by (L*W(-u),0 < u < 1) the local time process associated 

with the interval partition {1 — e~^ ,t > 0} cl and by 

P ii+1) =9 p U^(l-L*Hex P (X { 3))) 
the junction point; cf. (30). Define 

T^ 1 ) = {a G T« : [[p^,a]] n [[p (i) ,S*«]] = [[p^,p^]]}, 

d a+i) = (i _ eX p(-£;g_ T(i _ 1) ))- ( i»| T(l+1) , 

p^ = (1 - exp(-£;»_ T(l _ 1) ))" V° lro +1 ) • 
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Then consider the spinal subordinator of ~ in T^ +l \ 

Perform the construction of Lemma 19(b) and denote by t^ +1 ^ the asso- 
ciated switching time. Define 

F r(l)+t = exp(-A£ f * (m) ) for < t < r( i+1 ) - r«, 
F T (i+i) = 1 - exp(-A^*}*+ 1 ) ) _ T(i) ). 

Proposition 20. (a) For 8 > 0, the process (F t ,t > 0) is a Poisson 
point process with intensity measure uf at e(u) (and cemetery state 1). The 
subspace [[p, Si[[:= Ui>i[[P>P ]] *s such that Si 6 T is a leaf a.s., and 

([[p, Si]], 7rI^ ,Sl ^/i) is a weight-preserving isometric copy of (JZ\,pi). Fur- 
thermore, the spinal decomposition theorem holds for the spine [[p, Si]] and 
f/ie connected components (Ti,i£l) of T \ [[p, Si]]; c/. Proposition 18. 

(b) For = 0, i/ie process (Ft,0 < t < e) is a Poisson point process 
with intensity measure uf a g(u) killed at an independent exponential time 
e urei/t parameter 1/T(1 — a). Denote I such that = e. T/ien i/ie sub- 
space [[p, Si]] := [[p, p^]} is such that Si is not a leaf a.s. The weighted 

space ([[p, Si]], 7r* p,Sl "/i) is an isometric copy of {TZi,p\). F/ie spinal de- 
composition theorem holds for the spine [[p, Si]]. 

Proof. By Lemma 19, (F 4 ,0 < i < r 1 - 1 ^) is a Poisson point process 
with intensity measure u 2 f a ,e(u) du killed at an independent exponential 
time with parameter k = f^ 1 %(1 — u)uf aj g(u) du, and F r (i) has distribution 

K~ l p{l -u)(l- u)f* e (l -u) = - u)uf afi {u) du. 

For i > 1, denote by ft = u((fW,rW), • • • , (f (i) ,r (i) )) the cr-algebra gen- 
erated by the first i spinal subordinators and their switching times. It follows 
easily from the definition that 7~(* +1 ) \ } is a connected component of 

rW \ [[p«,S* (i) l]. By Proposition 18, the tree {T^ i+l \d^ i+l \ p( i+1 \ p^) 
is a copy of (TW,dW,pW,/iW) that is independent of ft. 

By induction and standard superposition results for Poisson point pro- 
cesses, the process (Ft,t > 0) is a Poisson point process with intensity mea- 
sure 

u 2 fa,e{u) du + (1 - u)uf a fi{u) du = uf a fi{u) du, 

as claimed. In particular, the associated mass process e - ^ = Ils<t Fs nas the 
same distribution as the process associated with (TZi,pi). 

For 6 > 0, completeness of T implies Si £ T and e~^°° = yields that Si 
is a leaf, since p would otherwise assign positive mass to the subtree Tj^ 1 
above Si. For = 0, note that Si G T^i) \ j> (/) }. 

The spinal decomposition theorem follows by a simple induction, from 
a version of Proposition 18 where S* is replaced by p( 2 \ That result is 
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proved like Proposition 18 using partition-valued fragmentation processes 
and stopping lines; see [4, 19]. □ 

The interval [[p, Si]] has length 

POO 

(34) d(p,£i) =T(l-a) / exp(-a£ t )dt = L(l), 

Jo 

whereas the interval [[p, E*]] has length 

POO . 

(35) d(p, EJ) = T(l - a) / exp{-aCt ')dt = L* {1 \l). 

Jo 

We have joined these two intervals at a junction point J\ i* = p^ at distance 



d(p,Ji,i*)=T(l-a)[ exp(-a£ t ) dt 
(36) J ° 

= T(l-a) / exp(-a£* (1) )cii, 

JO 

where r« is the switching time for the two coupled subordinators. Now the 
points Ei,E*, Ji t u have been embedded in the CRT (T,p). 

So Hi and 1Z\ are both embedded as paths in (T,p). Moreover, if we 
consider the strings of beads (lZi,pi) and (lZi,p\) associated via (30), the 
measures pi and p\ are the projections onto 1Zi and 1Z\ of the mass measure 
p in the CRT (T, p) . We can now check that, for 9 = 1 — a, the random length 
d(p, Ei) in (34) has the same distribution as the length Si described in [18], 
Proposition 18. From previous discussions, the ranked masses of pi have 
PD(a,#) distribution. The interval partition of [0,1] obtained by putting 
these masses in the order they appear along IZi = [[p, Ei]] is that associated 
with an (a, 8) regenerative composition of [0, 1] . 

Turning to k = 2, we identified switching probabilities in Proposition 10 
that identify the branch point for IZ2 in H\. As IZi has been embedded 
in T, we identify the branch point in T . Since the spinal decomposition 
theorem holds for the spine [[/?, Si]], to embed S2, we repeat in the subtree 
thus identified the procedure we used to embed Si in T. In particular, this 
procedure also constructs the mass measure P2 as the projection onto 7^-2 of 
the mass measure p on the CRT. 

An inductive step from (Rk,pk) to (Rk+i, Pk+i) now completes the em- 
bedding and hence the proof of Theorem 4. The inductive assumption will 
be that (Rk,Pk) has been embedded in the CRT with p^ the projection of 
the mass measure p of T, along with a description of pt as in Proposition 
14. 

This establishes the following corollary to Proposition 20. 
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Corollary 21. Given (TZk,fJ>k) embedded in (T,fj,), proceed as in Corol- 
lary 16: first pick an edge according to the allocation of mass to edges by [ik- 
If the edge is an inner edge, pick Jk from \ik conditioned on that edge. If the 
edge is a leaf edge, pick Jk instead from the atoms of \ik on this edge accord- 
ing to the scheme used to pick J\ from TZ\, using the obvious bijection. In 
either case, distribute the mass fJ-k({Jk}) onto a new edge [[Jk, £fc+i]] accord- 
ing to a scaled copy of the construction of TZ\ in Proposition 20. Then the 
tree (TZk U [[«/&, £fc+i]]) with measure as described is a copy of (TZk+i, fik+i)- 

PROOF of Theorem 4. The embedding of (TZi,m) into (T,/j,) was 
given in Proposition 20. An induction based on Corollary 21 completes the 
embedding of (TZk, f-k), k>l. □ 

4.4. Convergence of Markov branching trees and the proof of Theorem 3. 
An attractive feature of the above construction is that by a fairly obvious 
extension we can construct an TZk spanned by a root and £ !,...,£& gov- 
erned by the (a, 0)-rules, and a leaf exchangeable TZ* k spanned by a root and 
£J,...,££, all embedded in the same CRT (T,fj). Specifically, ££ +1 and 
£fc + i will by construction project onto the same edge of TZk- 

Proposition 22. In the above construction, <i(£fc, ££) — > almost surely. 

Proof. We work conditionally given (T, fi). Let 9 > 0. Let us show that, 
for all e > 0, there a.s. exists ki > 1 such that all edges of TZ^ have length 
less than e/3 and all connected components of T\TZk 1 have diameter less 
than e/3. 

First, to fix a subtree of diameter e/3, consider the connected components 

of 

{cr G T : {a' € % : d(a, a') > e/3} = 0}, 

each completed by their root on the branches of T. Since T is compact, 
at most finitely many components 7i,...,T/v actually attain height e/3. 
Fix subtree Tj with root Rj, and denote its mass by mj. Note that the 

interval partitions Z a , a €Tj\ {Rj}, induced by ([[p, o"]],7r| fi) coincide 
on [0, 1 — rrij], and denote the components of the restricted interval partition 
[0, 1 — nij] \ Z a = Uiex : gi<i-m- idii d-i)- Now, in the notation of the proof of 
Proposition 10, 

q k :=F(T j n(TZ k \TZk-i)^0\TZk-i)>m j J[ (1 - p 9i ) a.s. 

is bounded below uniformly in k. Therefore, the step when Tj n TZk ^ is 
bounded by a geometric random variable, and no subtrees of height e/3 can 
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persist outside TZk forever, so there a.s. exists ko > 1 such that T \ TZk has 
no connected components of diameter exceeding e/3. 

Second, fix an edge of TZk of length exceeding e/3. There are at most 
2/co — 1 such edges. The projected mass is an (a, a) or (a, #)-string of beads, 
dense on the edge. The dynamics of the growth process in Corollary 21 
are such that cut points on inner edges are selected according to the mass 
distribution. On leaf edges, an argument as for subtrees applies. Note also 
that all edges added in the growth procedure after step kg are part of a 
subtree of diameter less than e/3 and hence shorter than e/3. Therefore, 
there a.s. exists k\ > ko such that all edges in TZk are shorter than e/3 for 
all k>k\. 

In particular, for all k>k±,we deduce < e a.s., as required. 

For 6 = 0, the arguments still apply, but some details are different. Specif- 
ically, the first time a leaf edge is picked, the atom at its top is selected and 
spread over a new edge, the original edge then being an internal edge and the 
above argument applies. Similarly, the lower bound given for q n will vanish 
if Rj is an interior point of a leaf edge of TZk-i', but we can then proceed 
in two steps. Specifically, we first pick this leaf edge after a geometric time, 
when the mass at its leaf is spread over a new edge, the original edge then 
being an internal edge and Tj is then attained after a further geometric time 
with parameter rrij. □ 

Proof of Theorem 3. The argument given in the proof of Proposition 
22 also shows that TZk converges to T a.s. in the Hausdorff sense, which 
implies convergence of their isometry classes in the Gromov-Hausdorff sense. 
This proves the statement of Theorem 3 for the trees TZk constructed in 
Theorem 4, which assumes the existence of a CRT (T, /x) on the given 
probability space and sufficient extra randomness to sample repeatedly from 
fj, as needed for the construction of TZk- 

If TZk, k>l, are constructed from an (a,#)-tree growth process as in 
Proposition 2, then we use the fact that the whole sequence (TZk,k > 1) 
has the same distribution as if it was constructed as above. Almost sure 
convergence in the Gromov-Hausdorff sense is a property of the distribution 
on T N , where T denotes the space of isometry classes of compact real trees. 
We can define the limiting IR-tree T as the metric completion of Ufc>i^-fc 5 
using the completeness of T. □ 

Another consequence is that the uniform measure on leaves of TZk is closely 
coupled to the uniform measure on leaves of TZ\, and hence to the mass 
measure [i in the CRT. 

Corollary 23. In the setting of Proposition 2, there exists a CRT 
(T, n) on the same probability space, such that following convergences hold: 

(TZk,fJ-k) ~^ (X-il 1 ) * n the weighted Gromov-Hausdorff sense, 
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where [i k is the measure identified in Proposition 14(a), and 

(7Zk,Vk) in the weighted Gromov-Hausdorff sense, 

where v k is the empirical measure on the k leaves of 7Z k . 

Proof. We prove this for the embedded versions of Theorem 4. Since 
fj-k is the projection of fj, onto IZk C T, the first convergence is a direct 
consequence of Lemma 17 and the proof of Theorem 3. 

For the second convergence fix s > 0. Let k\ > 1 such that T \ IZk has no 
subtrees of diameter exceeding e/9 and, hence, <i(£fc,££) < e/3 for k > k\. 
Let &2 > 3&i/e and ks > &)2 such that dHaus(T)C^-fci T) < £ ior k>k^. Then 
the triangular inequality for the Prohorov distance shows for g = -ir^ k and 
/ : IZk ~ ¥ T the inclusion map that 

dp{f*Vk,v) = dp{v k ,v) <d P {v k ,^ k ) +d P (fi k ,fi) <e 

and 

dp(vk,g*v) < dp(vk,Vk) + dp([J,k, M) + d P (n,g*n) < e 
for all k>k2. This completes the proof. □ 
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